[PATCH v5] powerpc/uprobes: Validation for prefixed instruction
As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria Acked-by: Naveen N. Rao Acked-by: Sandipan Das --- v4: https://lore.kernel.org/r/20210305115433.140769-1-ravi.bango...@linux.ibm.com v4->v5: - Replace SZ_ macros with numbers arch/powerpc/kernel/uprobes.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..186f69b11e94 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,13 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (cpu_has_feature(CPU_FTR_ARCH_31) && + ppc_inst_prefixed(auprobe->insn) && + (addr & 0x3f) == 60) { + pr_info_ratelimited("Cannot register a uprobe on 64 byte unaligned prefixed instruction\n"); + return -EINVAL; + } + return 0; } -- 2.27.0
Re: [PATCH v4] powerpc/uprobes: Validation for prefixed instruction
On 3/9/21 4:51 PM, Naveen N. Rao wrote: On 2021/03/09 08:54PM, Michael Ellerman wrote: Ravi Bangoria writes: As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria Acked-by: Naveen N. Rao Do we have a Fixes: tag for this? Since this is an additional check we are adding, I don't think we should add a Fixes: tag. Nothing is broken per-se -- we're just adding more checks to catch simple mistakes. Also, like Oleg pointed out, there are still many other ways for users to shoot themselves in the foot with uprobes and prefixed instructions, if they so desire. However, if you still think we should add a Fixes: tag, we can perhaps use the below commit since I didn't see any specific commit adding support for prefixed instructions for uprobes: Fixes: 650b55b707fdfa ("powerpc: Add prefixed instructions to instruction data type") True. IMO, It doesn't really need any Fixes tag. --- v3: https://lore.kernel.org/r/20210304050529.59391-1-ravi.bango...@linux.ibm.com v3->v4: - CONFIG_PPC64 check was not required, remove it. - Use SZ_ macros instead of hardcoded numbers. arch/powerpc/kernel/uprobes.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..4cbfff6e94a3 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,13 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (cpu_has_feature(CPU_FTR_ARCH_31) && + ppc_inst_prefixed(auprobe->insn) && + (addr & (SZ_64 - 4)) == SZ_64 - 4) { + pr_info_ratelimited("Cannot register a uprobe on 64 byte unaligned prefixed instruction\n"); + return -EINVAL; I realise we already did the 0x03 check above, but I still think this would be clearer simply as: (addr & 0x3f == 60) Indeed, I like the use of `60' there -- hex is overrated ;) Sure. Will resend. Ravi
[PATCH v4] powerpc/uprobes: Validation for prefixed instruction
As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria Acked-by: Naveen N. Rao --- v3: https://lore.kernel.org/r/20210304050529.59391-1-ravi.bango...@linux.ibm.com v3->v4: - CONFIG_PPC64 check was not required, remove it. - Use SZ_ macros instead of hardcoded numbers. arch/powerpc/kernel/uprobes.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..4cbfff6e94a3 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,13 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (cpu_has_feature(CPU_FTR_ARCH_31) && + ppc_inst_prefixed(auprobe->insn) && + (addr & (SZ_64 - 4)) == SZ_64 - 4) { + pr_info_ratelimited("Cannot register a uprobe on 64 byte unaligned prefixed instruction\n"); + return -EINVAL; + } + return 0; } -- 2.27.0
Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction
On 3/4/21 4:21 PM, Christophe Leroy wrote: Le 04/03/2021 à 11:13, Ravi Bangoria a écrit : On 3/4/21 1:02 PM, Christophe Leroy wrote: Le 04/03/2021 à 06:05, Ravi Bangoria a écrit : As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v2: https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com v2->v3: - Drop restriction for Uprobe on suffix of prefixed instruction. It needs lot of code change including generic code but what we get in return is not worth it. arch/powerpc/kernel/uprobes.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..c400971ebe70 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31)) cpu_has_feature(CPU_FTR_ARCH_31) should return 'false' when CONFIG_PPC64 is not enabled, no need to double check. Ok. I'm going to drop CONFIG_PPC64 check because it's not really required as I replied to Naveen. So, I'll keep CPU_FTR_ARCH_31 check as is. + return 0; + + if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) { Maybe 3C instead of 4F ? : (addr & 0x3C) == 0x3C Didn't follow. It's not (addr & 0x3C), it's (addr & 0x3F). Sorry I meant 3c instead of 3f (And usually we don't use capital letters for that). The last two bits are supposed to always be 0, so it doesn't really matter, I just thought it would look better having the same value both sides of the test, ie (addr & 0x3c) == 0x3c. Ok yeah makes sense. Thanks. What about (addr & (SZ_64 - 4)) == SZ_64 - 4 to make it more explicit ? Yes this is bit better. Though, it should be: (addr & (SZ_64 - 1)) == SZ_64 - 4 -1 or -4 should give the same results as instructions are always 32 bits aligned though. Got it. Ravi
Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction
On 3/4/21 1:02 PM, Christophe Leroy wrote: Le 04/03/2021 à 06:05, Ravi Bangoria a écrit : As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v2: https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com v2->v3: - Drop restriction for Uprobe on suffix of prefixed instruction. It needs lot of code change including generic code but what we get in return is not worth it. arch/powerpc/kernel/uprobes.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..c400971ebe70 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31)) cpu_has_feature(CPU_FTR_ARCH_31) should return 'false' when CONFIG_PPC64 is not enabled, no need to double check. Ok. I'm going to drop CONFIG_PPC64 check because it's not really required as I replied to Naveen. So, I'll keep CPU_FTR_ARCH_31 check as is. + return 0; + + if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) { Maybe 3C instead of 4F ? : (addr & 0x3C) == 0x3C Didn't follow. It's not (addr & 0x3C), it's (addr & 0x3F). What about (addr & (SZ_64 - 4)) == SZ_64 - 4 to make it more explicit ? Yes this is bit better. Though, it should be: (addr & (SZ_64 - 1)) == SZ_64 - 4 Ravi
Re: [PATCH v3] powerpc/uprobes: Validation for prefixed instruction
@@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31)) + return 0; Sorry, I missed this last time, but I think we can drop the above checks. ppc_inst_prefixed() already factors in the dependency on CONFIG_PPC64, Yeah, makes sense. I initially added CONFIG_PPC64 check because I was using ppc_inst_prefix(x, y) macro which is not available for !CONFIG_PPC64. and I don't think we need to confirm if we're running on a ISA V3.1 for the below check. Prefixed instructions would be supported only when ARCH_31 is set. (Ignoring insane scenario where user probes on prefixed instruction on p10 predecessors). So I guess I still need ARCH_31 check? Ravi
[PATCH] perf report: Fix -F for branch & mem modes
perf report fails to add valid additional fields with -F when used with branch or mem modes. Fix it. Before patch: $ ./perf record -b $ ./perf report -b -F +srcline_from --stdio Error: Invalid --fields key: `srcline_from' After patch: $ ./perf report -b -F +srcline_from --stdio # Samples: 8K of event 'cycles' # Event count (approx.): 8784 ... Reported-by: Athira Rajeev Fixes: aa6b3c99236b ("perf report: Make -F more strict like -s") Signed-off-by: Ravi Bangoria --- tools/perf/util/sort.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 0d5ad42812b9..552b590485bf 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -3140,7 +3140,7 @@ int output_field_add(struct perf_hpp_list *list, char *tok) if (strncasecmp(tok, sd->name, strlen(tok))) continue; - if (sort__mode != SORT_MODE__MEMORY) + if (sort__mode != SORT_MODE__BRANCH) return -EINVAL; return __sort_dimension__add_output(list, sd); @@ -3152,7 +3152,7 @@ int output_field_add(struct perf_hpp_list *list, char *tok) if (strncasecmp(tok, sd->name, strlen(tok))) continue; - if (sort__mode != SORT_MODE__BRANCH) + if (sort__mode != SORT_MODE__MEMORY) return -EINVAL; return __sort_dimension__add_output(list, sd); -- 2.29.2
[PATCH v3] powerpc/uprobes: Validation for prefixed instruction
As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if that probe is on the 64-byte unaligned prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v2: https://lore.kernel.org/r/20210204104703.273429-1-ravi.bango...@linux.ibm.com v2->v3: - Drop restriction for Uprobe on suffix of prefixed instruction. It needs lot of code change including generic code but what we get in return is not worth it. arch/powerpc/kernel/uprobes.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..c400971ebe70 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -41,6 +41,14 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; + if (!IS_ENABLED(CONFIG_PPC64) || !cpu_has_feature(CPU_FTR_ARCH_31)) + return 0; + + if (ppc_inst_prefixed(auprobe->insn) && (addr & 0x3F) == 0x3C) { + pr_info_ratelimited("Cannot register a uprobe on 64 byte unaligned prefixed instruction\n"); + return -EINVAL; + } + return 0; } -- 2.27.0
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/4/21 6:38 PM, Naveen N. Rao wrote: On 2021/02/04 04:17PM, Ravi Bangoria wrote: Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v1: http://lore.kernel.org/r/20210119091234.76317-1-ravi.bango...@linux.ibm.com v1->v2: - Instead of introducing new arch hook from verify_opcode(), use existing hook arch_uprobe_analyze_insn(). - Add explicit check for prefixed instruction crossing 64-byte boundary. If probe is on such instruction, throw an error. arch/powerpc/kernel/uprobes.c | 66 ++- 1 file changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..485d19a2a31f 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -7,6 +7,7 @@ * Adapted from the x86 port by Ananth N Mavinakayanahalli */ #include +#include #include #include #include @@ -28,6 +29,69 @@ bool is_trap_insn(uprobe_opcode_t *insn) return (is_trap(*insn)); } +#ifdef CONFIG_PPC64 +static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr) +{ + struct page *page; + struct vm_area_struct *vma; + void *kaddr; + unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD; + + if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 0) + return -EINVAL; + + kaddr = kmap_atomic(page); + *instr = *((u32 *)(kaddr + (addr & ~PAGE_MASK))); + kunmap_atomic(kaddr); + put_page(page); + return 0; +} + +static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr) +{ + struct ppc_inst inst; + u32 prefix, suffix; + + /* +* No need to check if addr is pointing to beginning of the +* page. Even if probe is on a suffix of page-unaligned +* prefixed instruction, hw will raise exception and kernel +* will send SIGBUS. +*/ + if (!(addr & ~PAGE_MASK)) + return 0; + + if (get_instr(mm, addr, ) < 0) + return -EINVAL; + if (get_instr(mm, addr + 4, ) < 0) + return -EINVAL; + + inst = ppc_inst_prefix(prefix, suffix); + if (ppc_inst_prefixed(inst) && (addr & 0x3F) == 0x3C) { + printk_ratelimited("Cannot register a uprobe on 64 byte " ^^ pr_info_ratelimited() It should be sufficient to check the primary opcode to determine if it is a prefixed instruction. You don't have to read the suffix. I see that we don't have a helper to do this currently, so you could do: if (ppc_inst_primary_opcode(ppc_inst(prefix)) == 1) Ok. In the future, it might be worthwhile to add IS_PREFIX() as a macro similar to IS_MTMSRD() if there are more such uses. Along with this, if you also add the below to the start of this function, you can get rid of the #ifdef: if (!IS_ENABLED(CONFIG_PPC64)) return 0; Yeah this is better. Thanks for the review, Naveen! Ravi
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/4/21 9:42 PM, Naveen N. Rao wrote: On 2021/02/04 06:38PM, Naveen N. Rao wrote: On 2021/02/04 04:17PM, Ravi Bangoria wrote: Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v1: http://lore.kernel.org/r/20210119091234.76317-1-ravi.bango...@linux.ibm.com v1->v2: - Instead of introducing new arch hook from verify_opcode(), use existing hook arch_uprobe_analyze_insn(). - Add explicit check for prefixed instruction crossing 64-byte boundary. If probe is on such instruction, throw an error. arch/powerpc/kernel/uprobes.c | 66 ++- 1 file changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..485d19a2a31f 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -7,6 +7,7 @@ * Adapted from the x86 port by Ananth N Mavinakayanahalli */ #include +#include #include #include #include @@ -28,6 +29,69 @@ bool is_trap_insn(uprobe_opcode_t *insn) return (is_trap(*insn)); } +#ifdef CONFIG_PPC64 +static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr) +{ + struct page *page; + struct vm_area_struct *vma; + void *kaddr; + unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD; + + if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 0) + return -EINVAL; + + kaddr = kmap_atomic(page); + *instr = *((u32 *)(kaddr + (addr & ~PAGE_MASK))); + kunmap_atomic(kaddr); + put_page(page); + return 0; +} + +static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr) +{ + struct ppc_inst inst; + u32 prefix, suffix; + + /* +* No need to check if addr is pointing to beginning of the +* page. Even if probe is on a suffix of page-unaligned +* prefixed instruction, hw will raise exception and kernel +* will send SIGBUS. +*/ + if (!(addr & ~PAGE_MASK)) + return 0; + + if (get_instr(mm, addr, ) < 0) + return -EINVAL; + if (get_instr(mm, addr + 4, ) < 0) + return -EINVAL; + + inst = ppc_inst_prefix(prefix, suffix); + if (ppc_inst_prefixed(inst) && (addr & 0x3F) == 0x3C) { + printk_ratelimited("Cannot register a uprobe on 64 byte " ^^ pr_info_ratelimited() It should be sufficient to check the primary opcode to determine if it is a prefixed instruction. You don't have to read the suffix. I see that we don't have a helper to do this currently, so you could do: if (ppc_inst_primary_opcode(ppc_inst(prefix)) == 1) Seeing the kprobes code, I realized that we have to check for another scenario (Thanks, Jordan!). If this is the suffix of a prefix instruction for which a uprobe has already been installed, then the previous word will be a 'trap' instruction. You need to check if there is a uprobe at the previous word, and if the original instruction there was a prefix instruction. Yes, this patch will fail to detect such scenario. I think I should read the instruction directly from file, like what copy_insn() does. With that, I'll get original instruction rather that 'trap'. I'll think more along this line. Ravi
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/4/21 6:45 PM, Naveen N. Rao wrote: On 2021/02/04 04:19PM, Ravi Bangoria wrote: On 2/4/21 4:17 PM, Ravi Bangoria wrote: Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. @mpe, arch_uprobe_analyze_insn() can return early if cpu_has_feature(CPU_FTR_ARCH_31) is not set. But that will miss out a rare scenario of user running binary with prefixed instruction on p10 predecessors. Please let me know if I should add cpu_has_feature(CPU_FTR_ARCH_31) or not. The check you are adding is very specific to prefixed instructions, so it makes sense to add a cpu feature check for v3.1. On older processors, those are invalid instructions like any other. The instruction emulation infrastructure will refuse to emulate it and the instruction will be single stepped. Sure will add it. Ravi
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/6/21 11:36 PM, Oleg Nesterov wrote: On 02/04, Ravi Bangoria wrote: +static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr) +{ + struct page *page; + struct vm_area_struct *vma; + void *kaddr; + unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD; + + if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 0) + return -EINVAL; "vma" is not used, Ok. and I don't think you need FOLL_SPLIT_PMD. Isn't it needed if the target page is hugepage? Otherwise I can't really comment this ppc-specific change. To be honest, I don't even understand why do we need this fix. Sure, the breakpoint in the middle of 64-bit insn won't work, why do we care? The user should know what does he do. That's a valid point. This patch is to protract user from doing invalid thing. Though, there is one minor scenario where this patch will help. If the original prefixed instruction is 64 byte unaligned, and say user probes it, Uprobe infrastructure will emulate such instruction transparently without notifying user that the instruction is improperly aligned. Not to mention we can't really trust get_user_pages() in that this page can be modified by mm owner or debugger... As Naveen pointed out, there might be existing uprobe on the prefix and this patch will fail to detect such scenario. So I'm thinking to read the instruction directly from file backed page (like copy_insn), in which case I won't use get_user_pages(). Thanks Oleg for the review! Ravi
Re: [PATCH v2] powerpc/uprobes: Validation for prefixed instruction
On 2/4/21 4:17 PM, Ravi Bangoria wrote: Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. @mpe, arch_uprobe_analyze_insn() can return early if cpu_has_feature(CPU_FTR_ARCH_31) is not set. But that will miss out a rare scenario of user running binary with prefixed instruction on p10 predecessors. Please let me know if I should add cpu_has_feature(CPU_FTR_ARCH_31) or not. - Ravi
[PATCH v2] powerpc/uprobes: Validation for prefixed instruction
Don't allow Uprobe on 2nd word of a prefixed instruction. As per ISA 3.1, prefixed instruction should not cross 64-byte boundary. So don't allow Uprobe on such prefixed instruction as well. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- v1: http://lore.kernel.org/r/20210119091234.76317-1-ravi.bango...@linux.ibm.com v1->v2: - Instead of introducing new arch hook from verify_opcode(), use existing hook arch_uprobe_analyze_insn(). - Add explicit check for prefixed instruction crossing 64-byte boundary. If probe is on such instruction, throw an error. arch/powerpc/kernel/uprobes.c | 66 ++- 1 file changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..485d19a2a31f 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -7,6 +7,7 @@ * Adapted from the x86 port by Ananth N Mavinakayanahalli */ #include +#include #include #include #include @@ -28,6 +29,69 @@ bool is_trap_insn(uprobe_opcode_t *insn) return (is_trap(*insn)); } +#ifdef CONFIG_PPC64 +static int get_instr(struct mm_struct *mm, unsigned long addr, u32 *instr) +{ + struct page *page; + struct vm_area_struct *vma; + void *kaddr; + unsigned int gup_flags = FOLL_FORCE | FOLL_SPLIT_PMD; + + if (get_user_pages_remote(mm, addr, 1, gup_flags, , , NULL) <= 0) + return -EINVAL; + + kaddr = kmap_atomic(page); + *instr = *((u32 *)(kaddr + (addr & ~PAGE_MASK))); + kunmap_atomic(kaddr); + put_page(page); + return 0; +} + +static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr) +{ + struct ppc_inst inst; + u32 prefix, suffix; + + /* +* No need to check if addr is pointing to beginning of the +* page. Even if probe is on a suffix of page-unaligned +* prefixed instruction, hw will raise exception and kernel +* will send SIGBUS. +*/ + if (!(addr & ~PAGE_MASK)) + return 0; + + if (get_instr(mm, addr, ) < 0) + return -EINVAL; + if (get_instr(mm, addr + 4, ) < 0) + return -EINVAL; + + inst = ppc_inst_prefix(prefix, suffix); + if (ppc_inst_prefixed(inst) && (addr & 0x3F) == 0x3C) { + printk_ratelimited("Cannot register a uprobe on 64 byte " + "unaligned prefixed instruction\n"); + return -EINVAL; + } + + suffix = prefix; + if (get_instr(mm, addr - 4, ) < 0) + return -EINVAL; + + inst = ppc_inst_prefix(prefix, suffix); + if (ppc_inst_prefixed(inst)) { + printk_ratelimited("Cannot register a uprobe on the second " + "word of prefixed instruction\n"); + return -EINVAL; + } + return 0; +} +#else +static int validate_prefixed_instr(struct mm_struct *mm, unsigned long addr) +{ + return 0; +} +#endif + /** * arch_uprobe_analyze_insn * @mm: the probed address space. @@ -41,7 +105,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, if (addr & 0x03) return -EINVAL; - return 0; + return validate_prefixed_instr(mm, addr); } /* -- 2.26.2
Re: [PATCH] powerpc/uprobes: Don't allow probe on suffix of prefixed instruction
On 1/19/21 10:56 PM, Oleg Nesterov wrote: On 01/19, Ravi Bangoria wrote: Probe on 2nd word of a prefixed instruction is invalid scenario and should be restricted. I don't understand this ppc-specific problem, but... So far (upto Power9), instruction size was fixed - 4 bytes. But Power10 introduced a prefixed instruction which consist of 8 bytes, where first 4 bytes is prefix and remaining is suffix. This patch checks whether the Uprobe is on the 2nd word (suffix) of a prefixed instruction. If so, consider it as invalid Uprobe. +#ifdef CONFIG_PPC64 +int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, + uprobe_opcode_t opcode) +{ + uprobe_opcode_t prefix; + void *kaddr; + struct ppc_inst inst; + + /* Don't check if vaddr is pointing to the beginning of page */ + if (!(vaddr & ~PAGE_MASK)) + return 0; So the fix is incomplete? Or insn at the start of page can't be prefixed? Prefixed instruction can not cross 64 byte boundary. If it does, kernel generates SIGBUS. Considering all powerpc supported page sizes to be multiple of 64 bytes, there will never be a scenario where prefix and suffix will be on different pages. i.e. a beginning of the page should never be a suffix. +int __weak arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, +uprobe_opcode_t opcode) +{ + return 0; +} + static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode) { uprobe_opcode_t old_opcode; @@ -275,6 +281,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t if (is_swbp_insn(new_opcode)) { if (is_swbp)/* register: already installed? */ return 0; + if (arch_uprobe_verify_opcode(page, vaddr, old_opcode)) + return -EINVAL; Well, this doesn't look good... To me it would be better to change the prepare_uprobe() path to copy the potential prefix into uprobe->arch and check ppc_inst_prefixed() in arch_uprobe_analyze_insn(). What do you think? Agreed. The only reason I was checking via verify_opcode() is to make the code more simpler. If I need to check via prepare_uprobe(), I'll need to abuse uprobe->offset by setting it to uprobe->offset - 4 to read previous 4 bytes of current instruction. Which, IMHO, is not that straightforward with current implementation of prepare_uprobe(). But while replying here, I'm thinking... I should be able to grab a page using mm and vaddr, which are already available in arch_uprobe_analyze_insn(). With that, I should be able to do all this inside arch_uprobe_analyze_insn() only. I'll try this and send v2 if that works. Thanks for the review. Ravi
[PATCH] powerpc/uprobes: Don't allow probe on suffix of prefixed instruction
Probe on 2nd word of a prefixed instruction is invalid scenario and should be restricted. There are two ways probed instruction is changed in mapped pages. First, when Uprobe is activated, it searches for all the relevant pages and replace instruction in them. In this case, if we notice that probe is on the 2nd word of prefixed instruction, error out directly. Second, when Uprobe is already active and user maps a relevant page via mmap(), instruction is replaced via mmap() code path. But because Uprobe is invalid, entire mmap() operation can not be stopped. In this case just print an error and continue. Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/uprobes.c | 28 include/linux/uprobes.h | 1 + kernel/events/uprobes.c | 8 3 files changed, 37 insertions(+) diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index e8a63713e655..c73d5a397164 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -7,6 +7,7 @@ * Adapted from the x86 port by Ananth N Mavinakayanahalli */ #include +#include #include #include #include @@ -44,6 +45,33 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, return 0; } +#ifdef CONFIG_PPC64 +int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, + uprobe_opcode_t opcode) +{ + uprobe_opcode_t prefix; + void *kaddr; + struct ppc_inst inst; + + /* Don't check if vaddr is pointing to the beginning of page */ + if (!(vaddr & ~PAGE_MASK)) + return 0; + + kaddr = kmap_atomic(page); + memcpy(, kaddr + ((vaddr - 4) & ~PAGE_MASK), UPROBE_SWBP_INSN_SIZE); + kunmap_atomic(kaddr); + + inst = ppc_inst_prefix(prefix, opcode); + + if (ppc_inst_prefixed(inst)) { + printk_ratelimited("Cannot register a uprobe on the second " + "word of prefixed instruction\n"); + return -1; + } + return 0; +} +#endif + /* * arch_uprobe_pre_xol - prepare to execute out of line. * @auprobe: the probepoint information. diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index f46e0ca0169c..5a3b45878e13 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -128,6 +128,7 @@ extern bool uprobe_deny_signal(void); extern bool arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs *regs); extern void uprobe_clear_state(struct mm_struct *mm); extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr); +int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t opcode); extern int arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs); extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs); extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index bf9edd8d75be..be02e6c26e3f 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -255,6 +255,12 @@ static void copy_to_page(struct page *page, unsigned long vaddr, const void *src kunmap_atomic(kaddr); } +int __weak arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr, +uprobe_opcode_t opcode) +{ + return 0; +} + static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode) { uprobe_opcode_t old_opcode; @@ -275,6 +281,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t if (is_swbp_insn(new_opcode)) { if (is_swbp)/* register: already installed? */ return 0; + if (arch_uprobe_verify_opcode(page, vaddr, old_opcode)) + return -EINVAL; } else { if (!is_swbp) /* unregister: was it changed by us? */ return 0; -- 2.26.2
[PATCH v3 1/4] KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1
On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED hcall to load L2 guest state in cpu. L1 hypervisor prepares the L2 state in struct hv_guest_state and passes a pointer to it via hcall. Using that pointer, L0 reads/writes that state directly from/to L1 memory. Thus L0 must be aware of hv_guest_state layout of L1. Currently it uses version field to achieve this. i.e. If L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't allow nested kvm guest. This restriction can be loosen up a bit. L0 can be taught to understand older layout of hv_guest_state, if we restrict the new member to be added only at the end. i.e. we can allow nested guest even when L0 hv_guest_state.version > L1 hv_guest_state.version. Though, the other way around is not possible. Signed-off-by: Ravi Bangoria Reviewed-by: Fabiano Rosas --- arch/powerpc/include/asm/hvcall.h | 17 +++-- arch/powerpc/kvm/book3s_hv_nested.c | 55 +++-- 2 files changed, 60 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index c1fbccb04390..ca6840239f90 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -526,9 +526,12 @@ struct h_cpu_char_result { u64 behaviour; }; -/* Register state for entering a nested guest with H_ENTER_NESTED */ +/* + * Register state for entering a nested guest with H_ENTER_NESTED. + * New member must be added at the end. + */ struct hv_guest_state { - u64 version;/* version of this structure layout */ + u64 version;/* version of this structure layout, must be first */ u32 lpid; u32 vcpu_token; /* These registers are hypervisor privileged (at least for writing) */ @@ -562,6 +565,16 @@ struct hv_guest_state { /* Latest version of hv_guest_state structure */ #define HV_GUEST_STATE_VERSION 1 +static inline int hv_guest_state_size(unsigned int version) +{ + switch (version) { + case 1: + return offsetofend(struct hv_guest_state, ppr); + default: + return -1; + } +} + /* * From the document "H_GetPerformanceCounterInfo Interface" v1.07 * diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 33b58549a9aa..937dd5114300 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -215,12 +215,51 @@ static void kvmhv_nested_mmio_needed(struct kvm_vcpu *vcpu, u64 regs_ptr) } } +static int kvmhv_read_guest_state_and_regs(struct kvm_vcpu *vcpu, + struct hv_guest_state *l2_hv, + struct pt_regs *l2_regs, + u64 hv_ptr, u64 regs_ptr) +{ + int size; + + if (kvm_vcpu_read_guest(vcpu, hv_ptr, _hv->version, + sizeof(l2_hv->version))) + return -1; + + if (kvmppc_need_byteswap(vcpu)) + l2_hv->version = swab64(l2_hv->version); + + size = hv_guest_state_size(l2_hv->version); + if (size < 0) + return -1; + + return kvm_vcpu_read_guest(vcpu, hv_ptr, l2_hv, size) || + kvm_vcpu_read_guest(vcpu, regs_ptr, l2_regs, + sizeof(struct pt_regs)); +} + +static int kvmhv_write_guest_state_and_regs(struct kvm_vcpu *vcpu, + struct hv_guest_state *l2_hv, + struct pt_regs *l2_regs, + u64 hv_ptr, u64 regs_ptr) +{ + int size; + + size = hv_guest_state_size(l2_hv->version); + if (size < 0) + return -1; + + return kvm_vcpu_write_guest(vcpu, hv_ptr, l2_hv, size) || + kvm_vcpu_write_guest(vcpu, regs_ptr, l2_regs, +sizeof(struct pt_regs)); +} + long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) { long int err, r; struct kvm_nested_guest *l2; struct pt_regs l2_regs, saved_l1_regs; - struct hv_guest_state l2_hv, saved_l1_hv; + struct hv_guest_state l2_hv = {0}, saved_l1_hv; struct kvmppc_vcore *vc = vcpu->arch.vcore; u64 hv_ptr, regs_ptr; u64 hdec_exp; @@ -235,17 +274,15 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) hv_ptr = kvmppc_get_gpr(vcpu, 4); regs_ptr = kvmppc_get_gpr(vcpu, 5); vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); - err = kvm_vcpu_read_guest(vcpu, hv_ptr, _hv, - sizeof(struct hv_guest_state)) || - kvm_vcpu_read_guest(vcpu, regs_ptr, _regs, - sizeof(struct pt_regs)); + err = kvmhv_read_guest_state_and_regs(vcpu, _hv, _regs, + hv_ptr, reg
[PATCH v3 4/4] KVM: PPC: Introduce new capability for 2nd DAWR
Introduce KVM_CAP_PPC_DAWR1 which can be used by Qemu to query whether kvm supports 2nd DAWR or not. The capability is by default disabled even when the underlying CPU supports 2nd DAWR. Qemu needs to check and enable it manually to use the feature. Signed-off-by: Ravi Bangoria --- Documentation/virt/kvm/api.rst | 10 ++ arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/book3s_hv.c | 12 arch/powerpc/kvm/powerpc.c | 10 ++ include/uapi/linux/kvm.h | 1 + tools/include/uapi/linux/kvm.h | 1 + 6 files changed, 35 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index abb24575bdf9..049f07ebf197 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6016,6 +6016,16 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space can then handle to implement model specific MSR handling and/or user notifications to inform a user that an MSR was not handled. +7.22 KVM_CAP_PPC_DAWR1 +-- + +:Architectures: ppc +:Parameters: none +:Returns: 0 on success, -EINVAL when CPU doesn't support 2nd DAWR + +This capability can be used to check / enable 2nd DAWR feature provided +by POWER10 processor. + 8. Other capabilities. == diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 0a056c64c317..13c39d24dda5 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -314,6 +314,7 @@ struct kvmppc_ops { int size); int (*enable_svm)(struct kvm *kvm); int (*svm_off)(struct kvm *kvm); + int (*enable_dawr1)(struct kvm *kvm); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b7a30c0692a7..04c02344bd3f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -5625,6 +5625,17 @@ static int kvmhv_svm_off(struct kvm *kvm) return ret; } +static int kvmhv_enable_dawr1(struct kvm *kvm) +{ + if (!cpu_has_feature(CPU_FTR_DAWR1)) + return -ENODEV; + + /* kvm == NULL means the caller is testing if the capability exists */ + if (kvm) + kvm->arch.dawr1_enabled = true; + return 0; +} + static struct kvmppc_ops kvm_ops_hv = { .get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv, .set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv, @@ -5668,6 +5679,7 @@ static struct kvmppc_ops kvm_ops_hv = { .store_to_eaddr = kvmhv_store_to_eaddr, .enable_svm = kvmhv_enable_svm, .svm_off = kvmhv_svm_off, + .enable_dawr1 = kvmhv_enable_dawr1, }; static int kvm_init_subcore_bitmap(void) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 13999123b735..380656528b5b 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -678,6 +678,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = hv_enabled && kvmppc_hv_ops->enable_svm && !kvmppc_hv_ops->enable_svm(NULL); break; + case KVM_CAP_PPC_DAWR1: + r = !!(hv_enabled && kvmppc_hv_ops->enable_dawr1 && + !kvmppc_hv_ops->enable_dawr1(NULL)); + break; #endif default: r = 0; @@ -2187,6 +2191,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, break; r = kvm->arch.kvm_ops->enable_svm(kvm); break; + case KVM_CAP_PPC_DAWR1: + r = -EINVAL; + if (!is_kvmppc_hv_enabled(kvm) || !kvm->arch.kvm_ops->enable_dawr1) + break; + r = kvm->arch.kvm_ops->enable_dawr1(kvm); + break; #endif default: r = -EINVAL; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index ca41220b40b8..f1210f99a52d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_X86_USER_SPACE_MSR 188 #define KVM_CAP_X86_MSR_FILTER 189 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 +#define KVM_CAP_PPC_DAWR1 191 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index ca41220b40b8..f1210f99a52d 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_X86_USER_SPACE_MSR 188 #define KVM_CAP_X86_MSR_FILTER 189 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 +#define KVM_CAP_PPC_DAWR1 191 #ifdef KVM_CAP_IRQ_ROUTING -- 2.26.2
[PATCH v3 2/4] KVM: PPC: Rename current DAWR macros and variables
Power10 is introducing second DAWR. Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/kvm_host.h | 4 ++-- arch/powerpc/kernel/asm-offsets.c | 4 ++-- arch/powerpc/kvm/book3s_hv.c| 24 arch/powerpc/kvm/book3s_hv_nested.c | 8 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++-- 5 files changed, 30 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d67a470e95a3..62cadf1a596e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -584,8 +584,8 @@ struct kvm_vcpu_arch { u32 ctrl; u32 dabrx; ulong dabr; - ulong dawr; - ulong dawrx; + ulong dawr0; + ulong dawrx0; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index c2722ff36e98..5a77aac516ba 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -548,8 +548,8 @@ int main(void) OFFSET(VCPU_CTRL, kvm_vcpu, arch.ctrl); OFFSET(VCPU_DABR, kvm_vcpu, arch.dabr); OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); - OFFSET(VCPU_DAWR, kvm_vcpu, arch.dawr); - OFFSET(VCPU_DAWRX, kvm_vcpu, arch.dawrx); + OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); + OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index e3b1839fc251..bcbad8daa974 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -782,8 +782,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, return H_UNSUPPORTED_FLAG_START; if (value2 & DABRX_HYP) return H_P4; - vcpu->arch.dawr = value1; - vcpu->arch.dawrx = value2; + vcpu->arch.dawr0 = value1; + vcpu->arch.dawrx0 = value2; return H_SUCCESS; case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE: /* KVM does not support mflags=2 (AIL=2) */ @@ -1747,10 +1747,10 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, *val = get_reg_val(id, vcpu->arch.vcore->vtb); break; case KVM_REG_PPC_DAWR: - *val = get_reg_val(id, vcpu->arch.dawr); + *val = get_reg_val(id, vcpu->arch.dawr0); break; case KVM_REG_PPC_DAWRX: - *val = get_reg_val(id, vcpu->arch.dawrx); + *val = get_reg_val(id, vcpu->arch.dawrx0); break; case KVM_REG_PPC_CIABR: *val = get_reg_val(id, vcpu->arch.ciabr); @@ -1979,10 +1979,10 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, vcpu->arch.vcore->vtb = set_reg_val(id, *val); break; case KVM_REG_PPC_DAWR: - vcpu->arch.dawr = set_reg_val(id, *val); + vcpu->arch.dawr0 = set_reg_val(id, *val); break; case KVM_REG_PPC_DAWRX: - vcpu->arch.dawrx = set_reg_val(id, *val) & ~DAWRX_HYP; + vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP; break; case KVM_REG_PPC_CIABR: vcpu->arch.ciabr = set_reg_val(id, *val); @@ -3437,8 +3437,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, int trap; unsigned long host_hfscr = mfspr(SPRN_HFSCR); unsigned long host_ciabr = mfspr(SPRN_CIABR); - unsigned long host_dawr = mfspr(SPRN_DAWR0); - unsigned long host_dawrx = mfspr(SPRN_DAWRX0); + unsigned long host_dawr0 = mfspr(SPRN_DAWR0); + unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0); unsigned long host_psscr = mfspr(SPRN_PSSCR); unsigned long host_pidr = mfspr(SPRN_PID); @@ -3477,8 +3477,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_SPURR, vcpu->arch.spurr); if (dawr_enabled()) { - mtspr(SPRN_DAWR0, vcpu->arch.dawr); - mtspr(SPRN_DAWRX0, vcpu->arch.dawrx); + mtspr(SPRN_DAWR0, vcpu->arch.dawr0); + mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0); } mtspr(SPRN_CIABR, vcpu->arch.ciabr); mtspr(SPRN_IC, vcpu->arch.ic); @@ -3530,8 +3530,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64
[PATCH v3 3/4] KVM: PPC: Add infrastructure to support 2nd DAWR
kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria --- Documentation/virt/kvm/api.rst| 2 ++ arch/powerpc/include/asm/hvcall.h | 8 - arch/powerpc/include/asm/kvm_host.h | 3 ++ arch/powerpc/include/uapi/asm/kvm.h | 2 ++ arch/powerpc/kernel/asm-offsets.c | 2 ++ arch/powerpc/kvm/book3s_hv.c | 43 +++ arch/powerpc/kvm/book3s_hv_nested.c | 7 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 23 tools/arch/powerpc/include/uapi/asm/kvm.h | 2 ++ 9 files changed, 91 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 36d5f1f3c6dd..abb24575bdf9 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2249,6 +2249,8 @@ registers, find a list below: PPC KVM_REG_PPC_PSSCR 64 PPC KVM_REG_PPC_DEC_EXPIRY 64 PPC KVM_REG_PPC_PTCR64 + PPC KVM_REG_PPC_DAWR1 64 + PPC KVM_REG_PPC_DAWRX1 64 PPC KVM_REG_PPC_TM_GPR0 64 ... PPC KVM_REG_PPC_TM_GPR3164 diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index ca6840239f90..98afa58b619a 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -560,16 +560,22 @@ struct hv_guest_state { u64 pidr; u64 cfar; u64 ppr; + /* Version 1 ends here */ + u64 dawr1; + u64 dawrx1; + /* Version 2 ends here */ }; /* Latest version of hv_guest_state structure */ -#define HV_GUEST_STATE_VERSION 1 +#define HV_GUEST_STATE_VERSION 2 static inline int hv_guest_state_size(unsigned int version) { switch (version) { case 1: return offsetofend(struct hv_guest_state, ppr); + case 2: + return offsetofend(struct hv_guest_state, dawrx1); default: return -1; } diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62cadf1a596e..a93cfb672421 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -307,6 +307,7 @@ struct kvm_arch { u8 svm_enabled; bool threads_indep; bool nested_enable; + bool dawr1_enabled; pgd_t *pgtable; u64 process_table; struct dentry *debugfs_dir; @@ -586,6 +587,8 @@ struct kvm_vcpu_arch { ulong dabr; ulong dawr0; ulong dawrx0; + ulong dawr1; + ulong dawrx1; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index c3af3f324c5a..9f18fa090f1f 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -644,6 +644,8 @@ struct kvm_ppc_cpu_char { #define KVM_REG_PPC_MMCR3 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1) #define KVM_REG_PPC_SIER2 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2) #define KVM_REG_PPC_SIER3 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3) +#define KVM_REG_PPC_DAWR1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4) +#define KVM_REG_PPC_DAWRX1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5) /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 5a77aac516ba..a35ea4e19360 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -550,6 +550,8 @@ int main(void) OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); + OFFSET(VCPU_DAWR1, kvm_vcpu, arch.dawr1); + OFFSET(VCPU_DAWRX1, kvm_vcpu, arch.dawrx1); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index bcbad8daa974..b7a30c0692a7 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -785,6 +785,22 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, vcpu->arch.dawr0 = value1; vcpu->arch.dawrx0 = value2; return H_SUCCESS; + case H_SET_MODE_RESOURCE_SET_DAWR1: + if (!kvmppc_power8_compatible(vcpu)) + return H_P2; + if (!ppc_breakpoint_available()) + return H_P2; + if (!cpu_has_feature(CPU_FTR_DAWR1)) +
[PATCH v3 0/4] KVM: PPC: Power10 2nd DAWR enablement
Enable p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it. A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has been added to query 2nd DAWR support via kvm ioctl. This feature also needs to be enabled in Qemu to really use it. I'll post Qemu patches once kvm patches get accepted. v2: https://lore.kernel.org/kvm/20201124105953.39325-1-ravi.bango...@linux.ibm.com v2->v3: - Patch #1. If L0 version > L1, L0 hv_guest_state will contain some additional fields which won't be filled while reading from L1 memory and thus they can contain garbage. Initialize l2_hv with 0s to avoid such situations. - Patch #3. Introduce per vm flag dawr1_enabled. - Patch #4. Instead of auto enabling KVM_CAP_PPC_DAWR1, let user check and enable it manually. Also move KVM_CAP_PPC_DAWR1 check / enable logic inside #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE). - Explain KVM_CAP_PPC_DAWR1 in Documentation/virt/kvm/api.rst - Rebased on top of 5.10-rc3. v1->v2: - patch #1: New patch - patch #2: Don't rename KVM_REG_PPC_DAWR, it's an uapi macro - patch #3: Increment HV_GUEST_STATE_VERSION - Split kvm and selftests patches into different series - Patches rebased to paulus/kvm-ppc-next (cf59eb13e151) + few other watchpoint patches which are yet to be merged in paulus/kvm-ppc-next. Ravi Bangoria (4): KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1 KVM: PPC: Rename current DAWR macros and variables KVM: PPC: Add infrastructure to support 2nd DAWR KVM: PPC: Introduce new capability for 2nd DAWR Documentation/virt/kvm/api.rst| 12 arch/powerpc/include/asm/hvcall.h | 25 ++- arch/powerpc/include/asm/kvm_host.h | 7 +- arch/powerpc/include/asm/kvm_ppc.h| 1 + arch/powerpc/include/uapi/asm/kvm.h | 2 + arch/powerpc/kernel/asm-offsets.c | 6 +- arch/powerpc/kvm/book3s_hv.c | 79 +++ arch/powerpc/kvm/book3s_hv_nested.c | 70 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +--- arch/powerpc/kvm/powerpc.c| 10 +++ include/uapi/linux/kvm.h | 1 + tools/arch/powerpc/include/uapi/asm/kvm.h | 2 + tools/include/uapi/linux/kvm.h| 1 + 13 files changed, 216 insertions(+), 43 deletions(-) -- 2.26.2
Re: [PATCH] perf test: Skip test 68 for Powerpc
On 12/9/20 11:19 PM, Arnaldo Carvalho de Melo wrote: Em Tue, Dec 08, 2020 at 10:32:33PM +0530, Ravi Bangoria escreveu: On 12/8/20 8:13 PM, Thomas Richter wrote: On 12/7/20 5:35 PM, Arnaldo Carvalho de Melo wrote: Em Tue, Nov 24, 2020 at 03:04:53PM +0530, Ravi Bangoria escreveu: On 11/19/20 7:20 PM, Kajol Jain wrote: Commit ed21d6d7c48e6e ("perf tests: Add test for PE binary format support") adds a WINDOWS EXE file named tests/pe-file.exe, which is examined by the test case 'PE file support'. As powerpc doesn't support it, we are skipping this test. Result in power9 platform before this patach: [command]# ./perf test -F 68 68: PE file support : Failed! Result in power9 platform after this patch: [command]# ./perf test -F 68 68: PE file support : Skip Signed-off-by: Kajol Jain Reviewed-by: Ravi Bangoria But why is it failing? I.e. what is that perf test -v -F 68 outputs? Using 'perf report' on a perf.data file containing samples in such binaries, collected on x86 should work on whatever workstation a developer uses. Say, on a MacBook aarch64 one can look at a perf.data file collected on a x86_64 system where Wine running a PE binary was present. What is the distro you are using? I observed the same issue on s390 but this was fixed for fedora33 somehow. The error just went away after a dnf update [root@m35lp76 perf]# cat /etc/fedora-release Fedora release 33 (Thirty Three) [root@m35lp76 perf]# ./perf test -F 68 68: PE file support : Ok [root@m35lp76 perf]# However on my fedora32 machine it still fails: [root@t35lp46 perf]# cat /etc/fedora-release Fedora release 32 (Thirty Two) [root@t35lp46 perf]# ./perf test -F 68 68: PE file support : FAILED! [root@t35lp46 perf]# Note that I am running the same kernel on both machines: linux 5.10.0rc7 downloaded this morning. Ok that's interesting. I don't see that on powerpc. Fedora 32 with 5.10.0-rc2+ kernel: $ ./perf test -vv -F 68 68: PE file support : --- start --- filename__read_build_id: cannot read ./tests/pe-file.exe bfd file. FAILED tests/pe-file-parsing.c:40 Failed to read build_id end PE file support: FAILED! Fedora 33 with 5.10.0-rc3 kernel: $ ./perf test -vv -F 68 68: PE file support : --- start --- filename__read_build_id: cannot read ./tests/pe-file.exe bfd file. FAILED tests/pe-file-parsing.c:40 Failed to read build_id end PE file support: FAILED! Ubuntu 18.04.5 with 4.15.0-126-generic kernel: $ ./perf test -vv -F 68 68: PE file support : --- start --- filename__read_build_id: cannot read ./tests/pe-file.exe bfd file. FAILED tests/pe-file-parsing.c:41 Failed to read build_id end PE file support: FAILED! I assumed bfd is not capable to parse PE files on powerpc. Though, I didn't check it in more detail. I'll look into it tomorrow. Humm, so this is something related to installation? I.e. that pe-file.exe isn't being found... It first assumes that the developers are in the tools/perf/ directory, can you please add the patch below and see if it helps? I'm using upstream perf from tools/perf/ I checked bfd code and it's bfd_check_format() who is returning error "bfd_error_file_not_recognized". I cross verified with objdump as well: On x86: $ objdump -d ./tests/pe-file.exe ./tests/pe-file.exe: file format pei-x86-64 Disassembly of section .text: 00401000 <__mingw_invalidParameterHandler>: 401000: c3 retq 401001: 66 66 2e 0f 1f 84 00data16 nopw %cs:0x0(%rax,%rax,1) 401008: 00 00 00 00 40100c: 0f 1f 40 00 nopl 0x0(%rax) On powerpc: $ objdump -d ./tests/pe-file.exe objdump: ./tests/pe-file.exe: file format not recognized Objdump is also returning *same* error. I dig more into bfd logs and found that Powerpc PE support was removed recently (Jul 2020) with this commit: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=fe49679d5193f6ff7cfd333e30883d293112a3d1 Ravi
Re: [PATCH] perf test: Skip test 68 for Powerpc
On 12/8/20 8:13 PM, Thomas Richter wrote: On 12/7/20 5:35 PM, Arnaldo Carvalho de Melo wrote: Em Tue, Nov 24, 2020 at 03:04:53PM +0530, Ravi Bangoria escreveu: On 11/19/20 7:20 PM, Kajol Jain wrote: Commit ed21d6d7c48e6e ("perf tests: Add test for PE binary format support") adds a WINDOWS EXE file named tests/pe-file.exe, which is examined by the test case 'PE file support'. As powerpc doesn't support it, we are skipping this test. Result in power9 platform before this patach: [command]# ./perf test -F 68 68: PE file support : Failed! Result in power9 platform after this patch: [command]# ./perf test -F 68 68: PE file support : Skip Signed-off-by: Kajol Jain Reviewed-by: Ravi Bangoria But why is it failing? I.e. what is that perf test -v -F 68 outputs? Using 'perf report' on a perf.data file containing samples in such binaries, collected on x86 should work on whatever workstation a developer uses. Say, on a MacBook aarch64 one can look at a perf.data file collected on a x86_64 system where Wine running a PE binary was present. - Arnaldo Hi What is the distro you are using? I observed the same issue on s390 but this was fixed for fedora33 somehow. The error just went away after a dnf update [root@m35lp76 perf]# cat /etc/fedora-release Fedora release 33 (Thirty Three) [root@m35lp76 perf]# ./perf test -F 68 68: PE file support : Ok [root@m35lp76 perf]# However on my fedora32 machine it still fails: [root@t35lp46 perf]# cat /etc/fedora-release Fedora release 32 (Thirty Two) [root@t35lp46 perf]# ./perf test -F 68 68: PE file support : FAILED! [root@t35lp46 perf]# Note that I am running the same kernel on both machines: linux 5.10.0rc7 downloaded this morning. Ok that's interesting. I don't see that on powerpc. Fedora 32 with 5.10.0-rc2+ kernel: $ ./perf test -vv -F 68 68: PE file support : --- start --- filename__read_build_id: cannot read ./tests/pe-file.exe bfd file. FAILED tests/pe-file-parsing.c:40 Failed to read build_id end PE file support: FAILED! Fedora 33 with 5.10.0-rc3 kernel: $ ./perf test -vv -F 68 68: PE file support : --- start --- filename__read_build_id: cannot read ./tests/pe-file.exe bfd file. FAILED tests/pe-file-parsing.c:40 Failed to read build_id end PE file support: FAILED! Ubuntu 18.04.5 with 4.15.0-126-generic kernel: $ ./perf test -vv -F 68 68: PE file support : --- start --- filename__read_build_id: cannot read ./tests/pe-file.exe bfd file. FAILED tests/pe-file-parsing.c:41 Failed to read build_id end PE file support: FAILED! I assumed bfd is not capable to parse PE files on powerpc. Though, I didn't check it in more detail. I'll look into it tomorrow. Ravi
[PATCH v2 4/4] KVM: PPC: Introduce new capability for 2nd DAWR
Introduce KVM_CAP_PPC_DAWR1 which can be used by Qemu to query whether kvm supports 2nd DAWR or not. Signed-off-by: Ravi Bangoria --- arch/powerpc/kvm/powerpc.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 13999123b735..48763fe59fc5 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -679,6 +679,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) !kvmppc_hv_ops->enable_svm(NULL); break; #endif + case KVM_CAP_PPC_DAWR1: + r = cpu_has_feature(CPU_FTR_DAWR1); + break; default: r = 0; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f6d86033c4fa..0f32d6cbabc2 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1035,6 +1035,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_LAST_CPU 184 #define KVM_CAP_SMALLER_MAXPHYADDR 185 #define KVM_CAP_S390_DIAG318 186 +#define KVM_CAP_PPC_DAWR1 187 #ifdef KVM_CAP_IRQ_ROUTING -- 2.26.2
[PATCH v2 1/4] KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1
On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED hcall to load L2 guest state in cpu. L1 hypervisor prepares the L2 state in struct hv_guest_state and passes a pointer to it via hcall. Using that pointer, L0 reads/writes that state directly from/to L1 memory. Thus L0 must be aware of hv_guest_state layout of L1. Currently it uses version field to achieve this. i.e. If L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't allow nested kvm guest. This restriction can be loosen up a bit. L0 can be taught to understand older layout of hv_guest_state, if we restrict the new member to be added only at the end. i.e. we can allow nested guest even when L0 hv_guest_state.version > L1 hv_guest_state.version. Though, the other way around is not possible. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hvcall.h | 17 +++-- arch/powerpc/kvm/book3s_hv_nested.c | 53 - 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index fbb377055471..a7073fddb657 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -524,9 +524,12 @@ struct h_cpu_char_result { u64 behaviour; }; -/* Register state for entering a nested guest with H_ENTER_NESTED */ +/* + * Register state for entering a nested guest with H_ENTER_NESTED. + * New member must be added at the end. + */ struct hv_guest_state { - u64 version;/* version of this structure layout */ + u64 version;/* version of this structure layout, must be first */ u32 lpid; u32 vcpu_token; /* These registers are hypervisor privileged (at least for writing) */ @@ -560,6 +563,16 @@ struct hv_guest_state { /* Latest version of hv_guest_state structure */ #define HV_GUEST_STATE_VERSION 1 +static inline int hv_guest_state_size(unsigned int version) +{ + switch (version) { + case 1: + return offsetofend(struct hv_guest_state, ppr); + default: + return -1; + } +} + #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_HVCALL_H */ diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 33b58549a9aa..2b433c3bacea 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -215,6 +215,45 @@ static void kvmhv_nested_mmio_needed(struct kvm_vcpu *vcpu, u64 regs_ptr) } } +static int kvmhv_read_guest_state_and_regs(struct kvm_vcpu *vcpu, + struct hv_guest_state *l2_hv, + struct pt_regs *l2_regs, + u64 hv_ptr, u64 regs_ptr) +{ + int size; + + if (kvm_vcpu_read_guest(vcpu, hv_ptr, &(l2_hv->version), + sizeof(l2_hv->version))) + return -1; + + if (kvmppc_need_byteswap(vcpu)) + l2_hv->version = swab64(l2_hv->version); + + size = hv_guest_state_size(l2_hv->version); + if (size < 0) + return -1; + + return kvm_vcpu_read_guest(vcpu, hv_ptr, l2_hv, size) || + kvm_vcpu_read_guest(vcpu, regs_ptr, l2_regs, + sizeof(struct pt_regs)); +} + +static int kvmhv_write_guest_state_and_regs(struct kvm_vcpu *vcpu, + struct hv_guest_state *l2_hv, + struct pt_regs *l2_regs, + u64 hv_ptr, u64 regs_ptr) +{ + int size; + + size = hv_guest_state_size(l2_hv->version); + if (size < 0) + return -1; + + return kvm_vcpu_write_guest(vcpu, hv_ptr, l2_hv, size) || + kvm_vcpu_write_guest(vcpu, regs_ptr, l2_regs, +sizeof(struct pt_regs)); +} + long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) { long int err, r; @@ -235,17 +274,15 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) hv_ptr = kvmppc_get_gpr(vcpu, 4); regs_ptr = kvmppc_get_gpr(vcpu, 5); vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); - err = kvm_vcpu_read_guest(vcpu, hv_ptr, _hv, - sizeof(struct hv_guest_state)) || - kvm_vcpu_read_guest(vcpu, regs_ptr, _regs, - sizeof(struct pt_regs)); + err = kvmhv_read_guest_state_and_regs(vcpu, _hv, _regs, + hv_ptr, regs_ptr); srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); if (err) return H_PARAMETER; if (kvmppc_need_byteswap(vcpu)) byteswap_hv_regs(_hv); - if (l2_hv.version != HV_GUEST_STATE_VERSION) + if (l2_hv.version >
[PATCH v2 0/4] KVM: PPC: Power10 2nd DAWR enablement
Enable p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it. A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has been added to query 2nd DAWR support via kvm ioctl. This feature also needs to be enabled in Qemu to really use it. I'll post Qemu patches once kvm patches get accepted. v1: https://lore.kernel.org/r/20200723102058.312282-1-ravi.bango...@linux.ibm.com v1->v2: - patch #1: New patch - patch #2: Don't rename KVM_REG_PPC_DAWR, it's an uapi macro - patch #3: Increment HV_GUEST_STATE_VERSION - Split kvm and selftests patches into different series - Patches rebased to paulus/kvm-ppc-next (cf59eb13e151) + few other watchpoint patches which are yet to be merged in paulus/kvm-ppc-next. Ravi Bangoria (4): KVM: PPC: Allow nested guest creation when L0 hv_guest_state > L1 KVM: PPC: Rename current DAWR macros and variables KVM: PPC: Add infrastructure to support 2nd DAWR KVM: PPC: Introduce new capability for 2nd DAWR Documentation/virt/kvm/api.rst| 2 + arch/powerpc/include/asm/hvcall.h | 25 - arch/powerpc/include/asm/kvm_host.h | 6 +- arch/powerpc/include/uapi/asm/kvm.h | 2 + arch/powerpc/kernel/asm-offsets.c | 6 +- arch/powerpc/kvm/book3s_hv.c | 65 ++ arch/powerpc/kvm/book3s_hv_nested.c | 68 ++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 ++ arch/powerpc/kvm/powerpc.c| 3 + include/uapi/linux/kvm.h | 1 + tools/arch/powerpc/include/uapi/asm/kvm.h | 2 + 11 files changed, 181 insertions(+), 42 deletions(-) -- 2.26.2
[PATCH v2 2/4] KVM: PPC: Rename current DAWR macros and variables
Power10 is introducing second DAWR. Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/kvm_host.h | 4 ++-- arch/powerpc/kernel/asm-offsets.c | 4 ++-- arch/powerpc/kvm/book3s_hv.c| 24 arch/powerpc/kvm/book3s_hv_nested.c | 8 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++-- 5 files changed, 30 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d67a470e95a3..62cadf1a596e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -584,8 +584,8 @@ struct kvm_vcpu_arch { u32 ctrl; u32 dabrx; ulong dabr; - ulong dawr; - ulong dawrx; + ulong dawr0; + ulong dawrx0; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 8711c2164b45..e4256f5b4602 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -547,8 +547,8 @@ int main(void) OFFSET(VCPU_CTRL, kvm_vcpu, arch.ctrl); OFFSET(VCPU_DABR, kvm_vcpu, arch.dabr); OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); - OFFSET(VCPU_DAWR, kvm_vcpu, arch.dawr); - OFFSET(VCPU_DAWRX, kvm_vcpu, arch.dawrx); + OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); + OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 490a0f6a7285..d5c6efc8a76e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -782,8 +782,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, return H_UNSUPPORTED_FLAG_START; if (value2 & DABRX_HYP) return H_P4; - vcpu->arch.dawr = value1; - vcpu->arch.dawrx = value2; + vcpu->arch.dawr0 = value1; + vcpu->arch.dawrx0 = value2; return H_SUCCESS; case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE: /* KVM does not support mflags=2 (AIL=2) */ @@ -1747,10 +1747,10 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, *val = get_reg_val(id, vcpu->arch.vcore->vtb); break; case KVM_REG_PPC_DAWR: - *val = get_reg_val(id, vcpu->arch.dawr); + *val = get_reg_val(id, vcpu->arch.dawr0); break; case KVM_REG_PPC_DAWRX: - *val = get_reg_val(id, vcpu->arch.dawrx); + *val = get_reg_val(id, vcpu->arch.dawrx0); break; case KVM_REG_PPC_CIABR: *val = get_reg_val(id, vcpu->arch.ciabr); @@ -1979,10 +1979,10 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, vcpu->arch.vcore->vtb = set_reg_val(id, *val); break; case KVM_REG_PPC_DAWR: - vcpu->arch.dawr = set_reg_val(id, *val); + vcpu->arch.dawr0 = set_reg_val(id, *val); break; case KVM_REG_PPC_DAWRX: - vcpu->arch.dawrx = set_reg_val(id, *val) & ~DAWRX_HYP; + vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP; break; case KVM_REG_PPC_CIABR: vcpu->arch.ciabr = set_reg_val(id, *val); @@ -3437,8 +3437,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, int trap; unsigned long host_hfscr = mfspr(SPRN_HFSCR); unsigned long host_ciabr = mfspr(SPRN_CIABR); - unsigned long host_dawr = mfspr(SPRN_DAWR0); - unsigned long host_dawrx = mfspr(SPRN_DAWRX0); + unsigned long host_dawr0 = mfspr(SPRN_DAWR0); + unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0); unsigned long host_psscr = mfspr(SPRN_PSSCR); unsigned long host_pidr = mfspr(SPRN_PID); @@ -3477,8 +3477,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_SPURR, vcpu->arch.spurr); if (dawr_enabled()) { - mtspr(SPRN_DAWR0, vcpu->arch.dawr); - mtspr(SPRN_DAWRX0, vcpu->arch.dawrx); + mtspr(SPRN_DAWR0, vcpu->arch.dawr0); + mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0); } mtspr(SPRN_CIABR, vcpu->arch.ciabr); mtspr(SPRN_IC, vcpu->arch.ic); @@ -3530,8 +3530,8 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64
[PATCH v2 3/4] KVM: PPC: Add infrastructure to support 2nd DAWR
kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria --- Documentation/virt/kvm/api.rst| 2 ++ arch/powerpc/include/asm/hvcall.h | 8 - arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/include/uapi/asm/kvm.h | 2 ++ arch/powerpc/kernel/asm-offsets.c | 2 ++ arch/powerpc/kvm/book3s_hv.c | 41 +++ arch/powerpc/kvm/book3s_hv_nested.c | 7 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 23 + tools/arch/powerpc/include/uapi/asm/kvm.h | 2 ++ 9 files changed, 88 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index eb3a1316f03e..72c98735aa52 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2249,6 +2249,8 @@ registers, find a list below: PPC KVM_REG_PPC_PSSCR 64 PPC KVM_REG_PPC_DEC_EXPIRY 64 PPC KVM_REG_PPC_PTCR64 + PPC KVM_REG_PPC_DAWR1 64 + PPC KVM_REG_PPC_DAWRX1 64 PPC KVM_REG_PPC_TM_GPR0 64 ... PPC KVM_REG_PPC_TM_GPR3164 diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index a7073fddb657..4bacd27a348b 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -558,16 +558,22 @@ struct hv_guest_state { u64 pidr; u64 cfar; u64 ppr; + /* Version 1 ends here */ + u64 dawr1; + u64 dawrx1; + /* Version 2 ends here */ }; /* Latest version of hv_guest_state structure */ -#define HV_GUEST_STATE_VERSION 1 +#define HV_GUEST_STATE_VERSION 2 static inline int hv_guest_state_size(unsigned int version) { switch (version) { case 1: return offsetofend(struct hv_guest_state, ppr); + case 2: + return offsetofend(struct hv_guest_state, dawrx1); default: return -1; } diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62cadf1a596e..9804afdf8578 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -586,6 +586,8 @@ struct kvm_vcpu_arch { ulong dabr; ulong dawr0; ulong dawrx0; + ulong dawr1; + ulong dawrx1; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index c3af3f324c5a..9f18fa090f1f 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -644,6 +644,8 @@ struct kvm_ppc_cpu_char { #define KVM_REG_PPC_MMCR3 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1) #define KVM_REG_PPC_SIER2 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2) #define KVM_REG_PPC_SIER3 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3) +#define KVM_REG_PPC_DAWR1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4) +#define KVM_REG_PPC_DAWRX1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5) /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index e4256f5b4602..26d4fa8fe51e 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -549,6 +549,8 @@ int main(void) OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); + OFFSET(VCPU_DAWR1, kvm_vcpu, arch.dawr1); + OFFSET(VCPU_DAWRX1, kvm_vcpu, arch.dawrx1); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index d5c6efc8a76e..2ff645789e9e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -785,6 +785,20 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, vcpu->arch.dawr0 = value1; vcpu->arch.dawrx0 = value2; return H_SUCCESS; + case H_SET_MODE_RESOURCE_SET_DAWR1: + if (!kvmppc_power8_compatible(vcpu)) + return H_P2; + if (!ppc_breakpoint_available()) + return H_P2; + if (!cpu_has_feature(CPU_FTR_DAWR1)) + return H_P2; + if (mflags) + return H_UNSUPPORTED_FLAG_START; + if (value2 & DABRX_HYP) + return H_P4; + vcpu->arch.dawr1 = value1; +
Re: [PATCH] perf test: Skip test 68 for Powerpc
On 11/19/20 7:20 PM, Kajol Jain wrote: Commit ed21d6d7c48e6e ("perf tests: Add test for PE binary format support") adds a WINDOWS EXE file named tests/pe-file.exe, which is examined by the test case 'PE file support'. As powerpc doesn't support it, we are skipping this test. Result in power9 platform before this patach: [command]# ./perf test -F 68 68: PE file support : Failed! Result in power9 platform after this patch: [command]# ./perf test -F 68 68: PE file support : Skip Signed-off-by: Kajol Jain Reviewed-by: Ravi Bangoria
Re: [PATCH v6 0/8] powerpc/watchpoint: Bug fixes plus new feature flag
On 9/17/20 6:54 PM, Rogerio Alves wrote: On 9/2/20 1:29 AM, Ravi Bangoria wrote: Patch #1 fixes issue for quardword instruction on p10 predecessors. Patch #2 fixes issue for vector instructions. Patch #3 fixes a bug about watchpoint not firing when created with ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N. The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I guess, should be fine because we don't leak any kernel addresses and PRIV_ALL will also help to cover scenarios when kernel accesses user memory. Patch #4,#5 fixes infinite exception bug, again the bug happens only with CONFIG_HAVE_HW_BREAKPOINT=N. Patch #6 fixes two places where we are missing to set hw_len. Patch #7 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 which will be set when running on ISA 3.1 compliant machine. Patch #8 finally adds selftest to test scenarios fixed by patch#2,#3 and also moves MODE_EXACT tests outside of BP_RANGE condition. [...] Tested this patch set for: - SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N = OK - Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N = OK - Check for PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 = OK - Fix quarword instruction handling on p10 predecessors = OK - Fix handling of vector instructions = OK Also tested for: - Set second watchpoint (P10 Mambo) = OK - Infinity loop on sc instruction = OK Thanks Rogerio! Ravi
Re: [PATCH 0/7] powerpc/watchpoint: 2nd DAWR kvm enablement + selftests
Hi Paul, On 9/2/20 8:02 AM, Paul Mackerras wrote: On Thu, Jul 23, 2020 at 03:50:51PM +0530, Ravi Bangoria wrote: Patch #1, #2 and #3 enables p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it. A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has been added to query 2nd DAWR support via kvm ioctl. This feature also needs to be enabled in Qemu to really use it. I'll reply link to qemu patches once I post them in qemu-devel mailing list. Patch #4, #5, #6 and #7 adds selftests to test 2nd DAWR. If/when you resubmit these patches, please split the KVM patches into a separate series, since the KVM patches would go via my tree whereas I expect the selftests/powerpc patches would go through Michael Ellerman's tree. Sure. Will split it. Thanks, Ravi
Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR
Hi Paul, diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 33793444144c..03f401d7be41 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -538,6 +538,8 @@ struct hv_guest_state { s64 tb_offset; u64 dawr0; u64 dawrx0; + u64 dawr1; + u64 dawrx1; u64 ciabr; u64 hdec_expiry; u64 purr; After this struct, there is a macro HV_GUEST_STATE_VERSION, I guess that also needs to be incremented because I'm adding new members in the struct? Thanks, Ravi
Re: [PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR
Hi Paul, On 9/2/20 7:31 AM, Paul Mackerras wrote: On Thu, Jul 23, 2020 at 03:50:53PM +0530, Ravi Bangoria wrote: kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Is this the same interface as will be defined in PAPR and available under PowerVM, or is it a new/different interface for KVM? Yes, kvm hcall interface for 2nd DAWR is same as PowerVM, as defined in PAPR. Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set. In general QEMU wants to be able to control all aspects of the virtual machine presented to the guest, meaning that just because a host has a particular hardware capability does not mean we should automatically present that capability to the guest. In this case, QEMU will want a way to control whether the guest sees the availability of the second DAWR/X registers or not, i.e. whether a H_SET_MODE to set DAWR[X]1 will succeed or fail. Patch #3 adds new kvm capability KVM_CAP_PPC_DAWR1 that can be checked by Qemu. Also, as suggested by David in Qemu patch[1], I'm planning to add new machine capability in Qemu: -machine cap-dawr1=ON/OFF cap-dawr1 will be default ON when PPC_FEATURE2_ARCH_3_10 is set and OFF otherwise. Is this correct approach? [1]: https://lore.kernel.org/kvm/20200724045613.ga8...@umbus.fritz.box Thanks, Ravi
Re: [PATCH 1/7] powerpc/watchpoint/kvm: Rename current DAWR macros and variables
Hi Paul, On 9/2/20 7:19 AM, Paul Mackerras wrote: On Thu, Jul 23, 2020 at 03:50:52PM +0530, Ravi Bangoria wrote: Power10 is introducing second DAWR. Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. Most of this looks fine, but I think we should not change the existing names in arch/powerpc/include/uapi/asm/kvm.h (and therefore also Documentation/virt/kvm/api.rst). Missed that I'm changing uapi. I'll rename only those macros/variables which are not uapi. Thanks, Ravi
[PATCH v6 1/8] powerpc/watchpoint: Fix quarword instruction handling on p10 predecessors
On p10 predecessors, watchpoint with quarword access is compared at quardword length. If the watch range is doubleword or less than that in a first half of quarword aligned 16 bytes, and if there is any unaligned quadword access which will access only the 2nd half, the handler should consider it as extraneous and emulate/single-step it before continuing. Reported-by: Pedro Miraglia Franco de Carvalho Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 1 + arch/powerpc/kernel/hw_breakpoint.c | 12 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index db206a7f38e2..9b68eafebf43 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -42,6 +42,7 @@ struct arch_hw_breakpoint { #else #define HW_BREAKPOINT_SIZE 0x8 #endif +#define HW_BREAKPOINT_SIZE_QUADWORD0x10 #define DABR_MAX_LEN 8 #define DAWR_MAX_LEN 512 diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 1f4a1efa0074..9f7df1c37233 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -520,9 +520,17 @@ static bool ea_hw_range_overlaps(unsigned long ea, int size, struct arch_hw_breakpoint *info) { unsigned long hw_start_addr, hw_end_addr; + unsigned long align_size = HW_BREAKPOINT_SIZE; - hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE); - hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); + /* +* On p10 predecessors, quadword is handle differently then +* other instructions. +*/ + if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16) + align_size = HW_BREAKPOINT_SIZE_QUADWORD; + + hw_start_addr = ALIGN_DOWN(info->address, align_size); + hw_end_addr = ALIGN(info->address + info->len, align_size); return ((ea < hw_end_addr) && (ea + size > hw_start_addr)); } -- 2.26.2
[PATCH v6 8/8] powerpc/watchpoint/selftests: Tests for kernel accessing user memory
Introduce tests to cover simple scenarios where user is watching memory which can be accessed by kernel as well. We also support _MODE_EXACT with _SETHWDEBUG interface. Move those testcases out- side of _BP_RANGE condition. This will help to test _MODE_EXACT scenarios when CONFIG_HAVE_HW_BREAKPOINT is not set, eg: $ ./ptrace-hwbreak ... PTRACE_SET_DEBUGREG, Kernel Access Userspace, len: 8: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace, len: 1: Ok success: ptrace-hwbreak Suggested-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 ++- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index fc477dfe86a2..2e0d86e0687e 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -20,6 +20,8 @@ #include #include #include +#include +#include #include "ptrace.h" #define SPRN_PVR 0x11F @@ -44,6 +46,7 @@ struct gstruct { }; static volatile struct gstruct gstruct __attribute__((aligned(512))); +static volatile char cwd[PATH_MAX] __attribute__((aligned(8))); static void get_dbginfo(pid_t child_pid, struct ppc_debug_info *dbginfo) { @@ -138,6 +141,9 @@ static void test_workload(void) write_var(len); } + /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */ + syscall(__NR_getcwd, , PATH_MAX); + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO test */ write_var(1); @@ -150,6 +156,9 @@ static void test_workload(void) else read_var(1); + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */ + syscall(__NR_getcwd, , PATH_MAX); + /* PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO test */ gstruct.a[rand() % A_LEN] = 'a'; @@ -293,6 +302,24 @@ static int test_set_debugreg(pid_t child_pid) return 0; } +static int test_set_debugreg_kernel_userspace(pid_t child_pid) +{ + unsigned long wp_addr = (unsigned long)cwd; + char *name = "PTRACE_SET_DEBUGREG"; + + /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */ + wp_addr &= ~0x7UL; + wp_addr |= (1Ul << DABR_READ_SHIFT); + wp_addr |= (1UL << DABR_WRITE_SHIFT); + wp_addr |= (1UL << DABR_TRANSLATION_SHIFT); + ptrace_set_debugreg(child_pid, wp_addr); + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "Kernel Access Userspace", wp_addr, 8); + + ptrace_set_debugreg(child_pid, 0); + return 0; +} + static void get_ppc_hw_breakpoint(struct ppc_hw_breakpoint *info, int type, unsigned long addr, int len) { @@ -338,6 +365,22 @@ static void test_sethwdebug_exact(pid_t child_pid) ptrace_delhwdebug(child_pid, wh); } +static void test_sethwdebug_exact_kernel_userspace(pid_t child_pid) +{ + struct ppc_hw_breakpoint info; + unsigned long wp_addr = (unsigned long) + char *name = "PPC_PTRACE_SETHWDEBUG, MODE_EXACT"; + int len = 1; /* hardcoded in kernel */ + int wh; + + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */ + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr, 0); + wh = ptrace_sethwdebug(child_pid, ); + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "Kernel Access Userspace", wp_addr, len); + ptrace_delhwdebug(child_pid, wh); +} + static void test_sethwdebug_range_aligned(pid_t child_pid) { struct ppc_hw_breakpoint info; @@ -452,9 +495,10 @@ static void run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, bool dawr) { test_set_debugreg(child_pid); + test_set_debugreg_kernel_userspace(child_pid); + test_sethwdebug_exact(child_pid); + test_sethwdebug_exact_kernel_userspace(child_pid); if (dbginfo->features & PPC_DEBUG_FEATURE_DATA_BP_RANGE) { - test_sethwdebug_exact(child_pid); - test_sethwdebug_range_aligned(child_pid); if (dawr || is_8xx) { test_sethwdebug_range_unaligned(child_pid); -- 2.26.2
[PATCH v6 2/8] powerpc/watchpoint: Fix handling of vector instructions
Vector load/store instructions are special because they are always aligned. Thus unaligned EA needs to be aligned down before comparing it with watch ranges. Otherwise we might consider valid event as invalid. Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 9f7df1c37233..f6b24838ca3c 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -644,6 +644,8 @@ static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, if (*type == CACHEOP) { *size = cache_op_size(); *ea &= ~(*size - 1); + } else if (*type == LOAD_VMX || *type == STORE_VMX) { + *ea &= ~(*size - 1); } } -- 2.26.2
[PATCH v6 5/8] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N
On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel disables event every time it fires and user has to re-enable it. Also, in case of ptrace watchpoint, kernel notifies ptrace user before executing instruction. With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable ptrace event and thus it's causing infinite loop of exceptions. This is especially harmful when user watches on a data which is also read/written by kernel, eg syscall parameters. In such case, infinite exceptions happens in kernel mode which causes soft-lockup. Fixes: 9422de3e953d ("powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers") Reported-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 3 ++ arch/powerpc/kernel/process.c | 48 +++ arch/powerpc/kernel/ptrace/ptrace-noadv.c | 4 +- 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 81872c420476..abebfbee5b1c 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -18,6 +18,7 @@ struct arch_hw_breakpoint { u16 type; u16 len; /* length of the target data symbol */ u16 hw_len; /* length programmed in hw */ + u8 flags; }; /* Note: Don't change the first 6 bits below as they are in the same order @@ -37,6 +38,8 @@ struct arch_hw_breakpoint { #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \ HW_BRK_TYPE_HYP) +#define HW_BRK_FLAG_DISABLED 0x1 + /* Minimum granularity */ #ifdef CONFIG_PPC_8xx #define HW_BREAKPOINT_SIZE 0x4 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 016bd831908e..160fbbf41d40 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -636,6 +636,44 @@ void do_send_trap(struct pt_regs *regs, unsigned long address, (void __user *)address); } #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ + +static void do_break_handler(struct pt_regs *regs) +{ + struct arch_hw_breakpoint null_brk = {0}; + struct arch_hw_breakpoint *info; + struct ppc_inst instr = ppc_inst(0); + int type = 0; + int size = 0; + unsigned long ea; + int i; + + /* +* If underneath hw supports only one watchpoint, we know it +* caused exception. 8xx also falls into this category. +*/ + if (nr_wp_slots() == 1) { + __set_breakpoint(0, _brk); + current->thread.hw_brk[0] = null_brk; + current->thread.hw_brk[0].flags |= HW_BRK_FLAG_DISABLED; + return; + } + + /* Otherwise findout which DAWR caused exception and disable it. */ + wp_get_instr_detail(regs, , , , ); + + for (i = 0; i < nr_wp_slots(); i++) { + info = >thread.hw_brk[i]; + if (!info->address) + continue; + + if (wp_check_constraints(regs, instr, ea, type, size, info)) { + __set_breakpoint(i, _brk); + current->thread.hw_brk[i] = null_brk; + current->thread.hw_brk[i].flags |= HW_BRK_FLAG_DISABLED; + } + } +} + void do_break (struct pt_regs *regs, unsigned long address, unsigned long error_code) { @@ -647,6 +685,16 @@ void do_break (struct pt_regs *regs, unsigned long address, if (debugger_break_match(regs)) return; + /* +* We reach here only when watchpoint exception is generated by ptrace +* event (or hw is buggy!). Now if CONFIG_HAVE_HW_BREAKPOINT is set, +* watchpoint is already handled by hw_breakpoint_handler() so we don't +* have to do anything. But when CONFIG_HAVE_HW_BREAKPOINT is not set, +* we need to manually handle the watchpoint here. +*/ + if (!IS_ENABLED(CONFIG_HAVE_HW_BREAKPOINT)) + do_break_handler(regs); + /* Deliver the signal to userspace */ force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)address); } diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 57a0ab822334..c9122ed91340 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -286,11 +286,13 @@ long ppc_del_hwdebug(struct task_struct *child, long data) } return ret; #else /* CONFIG_HAVE_HW_BREAKPOINT */ - if (child->thread.hw_brk[data - 1].address == 0) + if (!(child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED) && + child->thread.hw_brk[data - 1].address == 0) return -ENOENT; child
[PATCH v6 7/8] powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 can be used to determine whether we are running on an ISA 3.1 compliant machine. Which is needed to determine DAR behaviour, 512 byte boundary limit etc. This was requested by Pedro Miraglia Franco de Carvalho for extending watchpoint features in gdb. Note that availability of 2nd DAWR is independent of this flag and should be checked using ppc_debug_info->num_data_bps. Signed-off-by: Ravi Bangoria --- Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 ++ 3 files changed, 4 insertions(+) diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst index 864d4b61..77725d69eb4a 100644 --- a/Documentation/powerpc/ptrace.rst +++ b/Documentation/powerpc/ptrace.rst @@ -46,6 +46,7 @@ features will have bits indicating whether there is support for:: #define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10 + #define PPC_DEBUG_FEATURE_DATA_BP_ARCH_310x20 2. PTRACE_SETHWDEBUG diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h index f5f1ccc740fc..7004cfea3f5f 100644 --- a/arch/powerpc/include/uapi/asm/ptrace.h +++ b/arch/powerpc/include/uapi/asm/ptrace.h @@ -222,6 +222,7 @@ struct ppc_debug_info { #define PPC_DEBUG_FEATURE_DATA_BP_RANGE0x0004 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x0008 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x0010 +#define PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 0x0020 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 48c52426af80..aa36fcad36cd 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -57,6 +57,8 @@ void ppc_gethwdinfo(struct ppc_debug_info *dbginfo) } else { dbginfo->features = 0; } + if (cpu_has_feature(CPU_FTR_ARCH_31)) + dbginfo->features |= PPC_DEBUG_FEATURE_DATA_BP_ARCH_31; } int ptrace_get_debugreg(struct task_struct *child, unsigned long addr, -- 2.26.2
[PATCH v6 3/8] powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N
When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited functionalities. But, such watchpoints are never firing because of the missing privilege settings. Fix that. It's safe to set HW_BRK_TYPE_PRIV_ALL because we don't really leak any kernel address in signal info. Setting HW_BRK_TYPE_PRIV_ALL will also help to find scenarios when kernel accesses user memory. Reported-by: Pedro Miraglia Franco de Carvalho Suggested-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 697c7e4b5877..57a0ab822334 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -217,7 +217,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf return -EIO; brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE); - brk.type = HW_BRK_TYPE_TRANSLATE; + brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL; brk.len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) brk.type |= HW_BRK_TYPE_READ; -- 2.26.2
[PATCH v6 0/8] powerpc/watchpoint: Bug fixes plus new feature flag
Patch #1 fixes issue for quardword instruction on p10 predecessors. Patch #2 fixes issue for vector instructions. Patch #3 fixes a bug about watchpoint not firing when created with ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N. The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I guess, should be fine because we don't leak any kernel addresses and PRIV_ALL will also help to cover scenarios when kernel accesses user memory. Patch #4,#5 fixes infinite exception bug, again the bug happens only with CONFIG_HAVE_HW_BREAKPOINT=N. Patch #6 fixes two places where we are missing to set hw_len. Patch #7 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 which will be set when running on ISA 3.1 compliant machine. Patch #8 finally adds selftest to test scenarios fixed by patch#2,#3 and also moves MODE_EXACT tests outside of BP_RANGE condition. Christophe, let me know if this series breaks something for 8xx. v5: https://lore.kernel.org/r/20200825043617.1073634-1-ravi.bango...@linux.ibm.com v5->v6: - Fix build faulure reported by kernel test robot - patch #5. Use more compact if condition, suggested by Christophe Ravi Bangoria (8): powerpc/watchpoint: Fix quarword instruction handling on p10 predecessors powerpc/watchpoint: Fix handling of vector instructions powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N powerpc/watchpoint: Add hw_len wherever missing powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 powerpc/watchpoint/selftests: Tests for kernel accessing user memory Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/asm/hw_breakpoint.h | 12 ++ arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/hw_breakpoint.c | 149 +--- .../kernel/hw_breakpoint_constraints.c| 162 ++ arch/powerpc/kernel/process.c | 48 ++ arch/powerpc/kernel/ptrace/ptrace-noadv.c | 9 +- arch/powerpc/xmon/xmon.c | 1 + .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 +- 10 files changed, 282 insertions(+), 152 deletions(-) create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c -- 2.26.2
[PATCH v6 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c
Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused the exception. So we have a sw logic to detect that in hw_breakpoint.c. But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y. Move DAWR detection logic outside of hw_breakpoint.c so that it can be reused when CONFIG_HAVE_HW_BREAKPOINT is not set. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 8 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/hw_breakpoint.c | 159 + .../kernel/hw_breakpoint_constraints.c| 162 ++ 4 files changed, 174 insertions(+), 158 deletions(-) create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 9b68eafebf43..81872c420476 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -10,6 +10,7 @@ #define _PPC_BOOK3S_64_HW_BREAKPOINT_H #include +#include #ifdef __KERNEL__ struct arch_hw_breakpoint { @@ -52,6 +53,13 @@ static inline int nr_wp_slots(void) return cpu_has_feature(CPU_FTR_DAWR1) ? 2 : 1; } +bool wp_check_constraints(struct pt_regs *regs, struct ppc_inst instr, + unsigned long ea, int type, int size, + struct arch_hw_breakpoint *info); + +void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, +int *type, int *size, unsigned long *ea); + #ifdef CONFIG_HAVE_HW_BREAKPOINT #include #include diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index cbf41fb4ee89..a5550c2b24c4 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -45,7 +45,8 @@ obj-y := cputable.o syscalls.o \ signal.o sysfs.o cacheinfo.o time.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ - of_platform.o prom_parse.o firmware.o + of_platform.o prom_parse.o firmware.o \ + hw_breakpoint_constraints.o obj-y += ptrace/ obj-$(CONFIG_PPC64)+= setup_64.o \ paca.o nvram_64.o note.o syscall_64.o diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index f6b24838ca3c..f4e8f21046f5 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -494,161 +494,6 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) } } -static bool dar_in_user_range(unsigned long dar, struct arch_hw_breakpoint *info) -{ - return ((info->address <= dar) && (dar - info->address < info->len)); -} - -static bool ea_user_range_overlaps(unsigned long ea, int size, - struct arch_hw_breakpoint *info) -{ - return ((ea < info->address + info->len) && - (ea + size > info->address)); -} - -static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info) -{ - unsigned long hw_start_addr, hw_end_addr; - - hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE); - hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); - - return ((hw_start_addr <= dar) && (hw_end_addr > dar)); -} - -static bool ea_hw_range_overlaps(unsigned long ea, int size, -struct arch_hw_breakpoint *info) -{ - unsigned long hw_start_addr, hw_end_addr; - unsigned long align_size = HW_BREAKPOINT_SIZE; - - /* -* On p10 predecessors, quadword is handle differently then -* other instructions. -*/ - if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16) - align_size = HW_BREAKPOINT_SIZE_QUADWORD; - - hw_start_addr = ALIGN_DOWN(info->address, align_size); - hw_end_addr = ALIGN(info->address + info->len, align_size); - - return ((ea < hw_end_addr) && (ea + size > hw_start_addr)); -} - -/* - * If hw has multiple DAWR registers, we also need to check all - * dawrx constraint bits to confirm this is _really_ a valid event. - * If type is UNKNOWN, but privilege level matches, consider it as - * a positive match. - */ -static bool check_dawrx_constraints(struct pt_regs *regs, int type, - struct arch_hw_breakpoint *info) -{ - if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ)) - return false; - - /* -* The Cache Management instructions other than dcbz never -* cause a match. i.e. if type is CACHEOP, the instruction -* is dcbz, and dcbz is tr
[PATCH v6 6/8] powerpc/watchpoint: Add hw_len wherever missing
There are couple of places where we set len but not hw_len. For ptrace/perf watchpoints, when CONFIG_HAVE_HW_BREAKPOINT=Y, hw_len will be calculated and set internally while parsing watchpoint. But when CONFIG_HAVE_HW_BREAKPOINT=N, we need to manually set 'hw_len'. Similarly for xmon as well, hw_len needs to be set directly. Fixes: b57aeab811db ("powerpc/watchpoint: Fix length calculation for unaligned target") Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 1 + arch/powerpc/xmon/xmon.c | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index c9122ed91340..48c52426af80 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -219,6 +219,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE); brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL; brk.len = DABR_MAX_LEN; + brk.hw_len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) brk.type |= HW_BRK_TYPE_READ; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index df7bca00f5ec..55c43a6c9111 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -969,6 +969,7 @@ static void insert_cpu_bpts(void) brk.address = dabr[i].address; brk.type = (dabr[i].enabled & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL; brk.len = 8; + brk.hw_len = 8; __set_breakpoint(i, ); } } -- 2.26.2
Re: [PATCH v5 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c
Hi Christophe, +static int cache_op_size(void) +{ +#ifdef __powerpc64__ + return ppc64_caches.l1d.block_size; +#else + return L1_CACHE_BYTES; +#endif +} You've got l1_dcache_bytes() in arch/powerpc/include/asm/cache.h to do that. + +void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, + int *type, int *size, unsigned long *ea) +{ + struct instruction_op op; + + if (__get_user_instr_inatomic(*instr, (void __user *)regs->nip)) + return; + + analyse_instr(, regs, *instr); + *type = GETTYPE(op.type); + *ea = op.ea; +#ifdef __powerpc64__ + if (!(regs->msr & MSR_64BIT)) + *ea &= 0xUL; +#endif This #ifdef is unneeded, it should build fine on a 32 bits too. This patch is just a code movement from one file to another. I don't really change the logic. Would you mind if I do a separate patch for these changes (not a part of this series)? Thanks for review, Ravi
Re: [PATCH v5 5/8] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N
Hi Christophe, diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 57a0ab822334..866597b407bc 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -286,11 +286,16 @@ long ppc_del_hwdebug(struct task_struct *child, long data) } return ret; #else /* CONFIG_HAVE_HW_BREAKPOINT */ + if (child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED) I think child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED should go around additionnal () Not sure I follow. + goto del; + if (child->thread.hw_brk[data - 1].address == 0) return -ENOENT; What about replacing the above if by: if (!(child->thread.hw_brk[data - 1].flags) & HW_BRK_FLAG_DISABLED) && child->thread.hw_brk[data - 1].address == 0) return -ENOENT; okay.. that's more compact. But more importantly, what I wanted to know is whether CONFIG_HAVE_HW_BREAKPOINT is set or not in production/distro builds for 8xx. Because I see it's not set in 8xx defconfigs. Thanks, Ravi
[PATCH v5 7/8] powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 can be used to determine whether we are running on an ISA 3.1 compliant machine. Which is needed to determine DAR behaviour, 512 byte boundary limit etc. This was requested by Pedro Miraglia Franco de Carvalho for extending watchpoint features in gdb. Note that availability of 2nd DAWR is independent of this flag and should be checked using ppc_debug_info->num_data_bps. Signed-off-by: Ravi Bangoria --- Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 ++ 3 files changed, 4 insertions(+) diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst index 864d4b61..77725d69eb4a 100644 --- a/Documentation/powerpc/ptrace.rst +++ b/Documentation/powerpc/ptrace.rst @@ -46,6 +46,7 @@ features will have bits indicating whether there is support for:: #define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10 + #define PPC_DEBUG_FEATURE_DATA_BP_ARCH_310x20 2. PTRACE_SETHWDEBUG diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h index f5f1ccc740fc..7004cfea3f5f 100644 --- a/arch/powerpc/include/uapi/asm/ptrace.h +++ b/arch/powerpc/include/uapi/asm/ptrace.h @@ -222,6 +222,7 @@ struct ppc_debug_info { #define PPC_DEBUG_FEATURE_DATA_BP_RANGE0x0004 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x0008 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x0010 +#define PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 0x0020 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 081c39842d84..1d0235db3c1b 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -57,6 +57,8 @@ void ppc_gethwdinfo(struct ppc_debug_info *dbginfo) } else { dbginfo->features = 0; } + if (cpu_has_feature(CPU_FTR_ARCH_31)) + dbginfo->features |= PPC_DEBUG_FEATURE_DATA_BP_ARCH_31; } int ptrace_get_debugreg(struct task_struct *child, unsigned long addr, -- 2.26.2
[PATCH v5 3/8] powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N
When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited functionalities. But, such watchpoints are never firing because of the missing privilege settings. Fix that. It's safe to set HW_BRK_TYPE_PRIV_ALL because we don't really leak any kernel address in signal info. Setting HW_BRK_TYPE_PRIV_ALL will also help to find scenarios when kernel corrupts user memory. Reported-by: Pedro Miraglia Franco de Carvalho Suggested-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 697c7e4b5877..57a0ab822334 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -217,7 +217,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf return -EIO; brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE); - brk.type = HW_BRK_TYPE_TRANSLATE; + brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL; brk.len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) brk.type |= HW_BRK_TYPE_READ; -- 2.26.2
[PATCH v5 4/8] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c
Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused the exception. So we have a sw logic to detect that in hw_breakpoint.c. But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y. Move DAWR detection logic outside of hw_breakpoint.c so that it can be reused when CONFIG_HAVE_HW_BREAKPOINT is not set. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 8 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/hw_breakpoint.c | 159 + .../kernel/hw_breakpoint_constraints.c| 162 ++ 4 files changed, 174 insertions(+), 158 deletions(-) create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index da38e05e04d9..2eca3dd54b55 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -10,6 +10,7 @@ #define _PPC_BOOK3S_64_HW_BREAKPOINT_H #include +#include #ifdef __KERNEL__ struct arch_hw_breakpoint { @@ -52,6 +53,13 @@ static inline int nr_wp_slots(void) return cpu_has_feature(CPU_FTR_DAWR1) ? 2 : 1; } +bool wp_check_constraints(struct pt_regs *regs, struct ppc_inst instr, + unsigned long ea, int type, int size, + struct arch_hw_breakpoint *info); + +void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, +int *type, int *size, unsigned long *ea); + #ifdef CONFIG_HAVE_HW_BREAKPOINT #include #include diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index cbf41fb4ee89..a5550c2b24c4 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -45,7 +45,8 @@ obj-y := cputable.o syscalls.o \ signal.o sysfs.o cacheinfo.o time.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ - of_platform.o prom_parse.o firmware.o + of_platform.o prom_parse.o firmware.o \ + hw_breakpoint_constraints.o obj-y += ptrace/ obj-$(CONFIG_PPC64)+= setup_64.o \ paca.o nvram_64.o note.o syscall_64.o diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index f6b24838ca3c..f4e8f21046f5 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -494,161 +494,6 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) } } -static bool dar_in_user_range(unsigned long dar, struct arch_hw_breakpoint *info) -{ - return ((info->address <= dar) && (dar - info->address < info->len)); -} - -static bool ea_user_range_overlaps(unsigned long ea, int size, - struct arch_hw_breakpoint *info) -{ - return ((ea < info->address + info->len) && - (ea + size > info->address)); -} - -static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info) -{ - unsigned long hw_start_addr, hw_end_addr; - - hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE); - hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); - - return ((hw_start_addr <= dar) && (hw_end_addr > dar)); -} - -static bool ea_hw_range_overlaps(unsigned long ea, int size, -struct arch_hw_breakpoint *info) -{ - unsigned long hw_start_addr, hw_end_addr; - unsigned long align_size = HW_BREAKPOINT_SIZE; - - /* -* On p10 predecessors, quadword is handle differently then -* other instructions. -*/ - if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16) - align_size = HW_BREAKPOINT_SIZE_QUADWORD; - - hw_start_addr = ALIGN_DOWN(info->address, align_size); - hw_end_addr = ALIGN(info->address + info->len, align_size); - - return ((ea < hw_end_addr) && (ea + size > hw_start_addr)); -} - -/* - * If hw has multiple DAWR registers, we also need to check all - * dawrx constraint bits to confirm this is _really_ a valid event. - * If type is UNKNOWN, but privilege level matches, consider it as - * a positive match. - */ -static bool check_dawrx_constraints(struct pt_regs *regs, int type, - struct arch_hw_breakpoint *info) -{ - if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ)) - return false; - - /* -* The Cache Management instructions other than dcbz never -* cause a match. i.e. if type is CACHEOP, the instruction -* is dcbz, and dcbz is tr
[PATCH v5 5/8] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N
On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel disables event every time it fires and user has to re-enable it. Also, in case of ptrace watchpoint, kernel notifies ptrace user before executing instruction. With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable ptrace event and thus it's causing infinite loop of exceptions. This is especially harmful when user watches on a data which is also read/written by kernel, eg syscall parameters. In such case, infinite exceptions happens in kernel mode which causes soft-lockup. Fixes: 9422de3e953d ("powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers") Reported-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 3 ++ arch/powerpc/kernel/process.c | 48 +++ arch/powerpc/kernel/ptrace/ptrace-noadv.c | 5 +++ 3 files changed, 56 insertions(+) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 2eca3dd54b55..c72263214d3f 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -18,6 +18,7 @@ struct arch_hw_breakpoint { u16 type; u16 len; /* length of the target data symbol */ u16 hw_len; /* length programmed in hw */ + u8 flags; }; /* Note: Don't change the first 6 bits below as they are in the same order @@ -37,6 +38,8 @@ struct arch_hw_breakpoint { #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \ HW_BRK_TYPE_HYP) +#define HW_BRK_FLAG_DISABLED 0x1 + /* Minimum granularity */ #ifdef CONFIG_PPC_8xx #define HW_BREAKPOINT_SIZE 0x4 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 016bd831908e..160fbbf41d40 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -636,6 +636,44 @@ void do_send_trap(struct pt_regs *regs, unsigned long address, (void __user *)address); } #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ + +static void do_break_handler(struct pt_regs *regs) +{ + struct arch_hw_breakpoint null_brk = {0}; + struct arch_hw_breakpoint *info; + struct ppc_inst instr = ppc_inst(0); + int type = 0; + int size = 0; + unsigned long ea; + int i; + + /* +* If underneath hw supports only one watchpoint, we know it +* caused exception. 8xx also falls into this category. +*/ + if (nr_wp_slots() == 1) { + __set_breakpoint(0, _brk); + current->thread.hw_brk[0] = null_brk; + current->thread.hw_brk[0].flags |= HW_BRK_FLAG_DISABLED; + return; + } + + /* Otherwise findout which DAWR caused exception and disable it. */ + wp_get_instr_detail(regs, , , , ); + + for (i = 0; i < nr_wp_slots(); i++) { + info = >thread.hw_brk[i]; + if (!info->address) + continue; + + if (wp_check_constraints(regs, instr, ea, type, size, info)) { + __set_breakpoint(i, _brk); + current->thread.hw_brk[i] = null_brk; + current->thread.hw_brk[i].flags |= HW_BRK_FLAG_DISABLED; + } + } +} + void do_break (struct pt_regs *regs, unsigned long address, unsigned long error_code) { @@ -647,6 +685,16 @@ void do_break (struct pt_regs *regs, unsigned long address, if (debugger_break_match(regs)) return; + /* +* We reach here only when watchpoint exception is generated by ptrace +* event (or hw is buggy!). Now if CONFIG_HAVE_HW_BREAKPOINT is set, +* watchpoint is already handled by hw_breakpoint_handler() so we don't +* have to do anything. But when CONFIG_HAVE_HW_BREAKPOINT is not set, +* we need to manually handle the watchpoint here. +*/ + if (!IS_ENABLED(CONFIG_HAVE_HW_BREAKPOINT)) + do_break_handler(regs); + /* Deliver the signal to userspace */ force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)address); } diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 57a0ab822334..866597b407bc 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -286,11 +286,16 @@ long ppc_del_hwdebug(struct task_struct *child, long data) } return ret; #else /* CONFIG_HAVE_HW_BREAKPOINT */ + if (child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED) + goto del; + if (child->thread.hw_brk[data - 1].address == 0) return -ENOENT; +del: child->thread.hw_brk[data - 1].address = 0;
[PATCH v5 6/8] powerpc/watchpoint: Add hw_len wherever missing
There are couple of places where we set len but not hw_len. For ptrace/perf watchpoints, when CONFIG_HAVE_HW_BREAKPOINT=Y, hw_len will be calculated and set internally while parsing watchpoint. But when CONFIG_HAVE_HW_BREAKPOINT=N, we need to manually set 'hw_len'. Similarly for xmon as well, hw_len needs to be set directly. Fixes: b57aeab811db ("powerpc/watchpoint: Fix length calculation for unaligned target") Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 1 + arch/powerpc/xmon/xmon.c | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 866597b407bc..081c39842d84 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -219,6 +219,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE); brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL; brk.len = DABR_MAX_LEN; + brk.hw_len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) brk.type |= HW_BRK_TYPE_READ; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index df7bca00f5ec..55c43a6c9111 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -969,6 +969,7 @@ static void insert_cpu_bpts(void) brk.address = dabr[i].address; brk.type = (dabr[i].enabled & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL; brk.len = 8; + brk.hw_len = 8; __set_breakpoint(i, ); } } -- 2.26.2
[PATCH v5 8/8] powerpc/watchpoint/selftests: Tests for kernel accessing user memory
Introduce tests to cover simple scenarios where user is watching memory which can be accessed by kernel as well. We also support _MODE_EXACT with _SETHWDEBUG interface. Move those testcases out- side of _BP_RANGE condition. This will help to test _MODE_EXACT scenarios when CONFIG_HAVE_HW_BREAKPOINT is not set, eg: $ ./ptrace-hwbreak ... PTRACE_SET_DEBUGREG, Kernel Access Userspace, len: 8: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace, len: 1: Ok success: ptrace-hwbreak Suggested-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 ++- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index fc477dfe86a2..2e0d86e0687e 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -20,6 +20,8 @@ #include #include #include +#include +#include #include "ptrace.h" #define SPRN_PVR 0x11F @@ -44,6 +46,7 @@ struct gstruct { }; static volatile struct gstruct gstruct __attribute__((aligned(512))); +static volatile char cwd[PATH_MAX] __attribute__((aligned(8))); static void get_dbginfo(pid_t child_pid, struct ppc_debug_info *dbginfo) { @@ -138,6 +141,9 @@ static void test_workload(void) write_var(len); } + /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */ + syscall(__NR_getcwd, , PATH_MAX); + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO test */ write_var(1); @@ -150,6 +156,9 @@ static void test_workload(void) else read_var(1); + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */ + syscall(__NR_getcwd, , PATH_MAX); + /* PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO test */ gstruct.a[rand() % A_LEN] = 'a'; @@ -293,6 +302,24 @@ static int test_set_debugreg(pid_t child_pid) return 0; } +static int test_set_debugreg_kernel_userspace(pid_t child_pid) +{ + unsigned long wp_addr = (unsigned long)cwd; + char *name = "PTRACE_SET_DEBUGREG"; + + /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */ + wp_addr &= ~0x7UL; + wp_addr |= (1Ul << DABR_READ_SHIFT); + wp_addr |= (1UL << DABR_WRITE_SHIFT); + wp_addr |= (1UL << DABR_TRANSLATION_SHIFT); + ptrace_set_debugreg(child_pid, wp_addr); + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "Kernel Access Userspace", wp_addr, 8); + + ptrace_set_debugreg(child_pid, 0); + return 0; +} + static void get_ppc_hw_breakpoint(struct ppc_hw_breakpoint *info, int type, unsigned long addr, int len) { @@ -338,6 +365,22 @@ static void test_sethwdebug_exact(pid_t child_pid) ptrace_delhwdebug(child_pid, wh); } +static void test_sethwdebug_exact_kernel_userspace(pid_t child_pid) +{ + struct ppc_hw_breakpoint info; + unsigned long wp_addr = (unsigned long) + char *name = "PPC_PTRACE_SETHWDEBUG, MODE_EXACT"; + int len = 1; /* hardcoded in kernel */ + int wh; + + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */ + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr, 0); + wh = ptrace_sethwdebug(child_pid, ); + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "Kernel Access Userspace", wp_addr, len); + ptrace_delhwdebug(child_pid, wh); +} + static void test_sethwdebug_range_aligned(pid_t child_pid) { struct ppc_hw_breakpoint info; @@ -452,9 +495,10 @@ static void run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, bool dawr) { test_set_debugreg(child_pid); + test_set_debugreg_kernel_userspace(child_pid); + test_sethwdebug_exact(child_pid); + test_sethwdebug_exact_kernel_userspace(child_pid); if (dbginfo->features & PPC_DEBUG_FEATURE_DATA_BP_RANGE) { - test_sethwdebug_exact(child_pid); - test_sethwdebug_range_aligned(child_pid); if (dawr || is_8xx) { test_sethwdebug_range_unaligned(child_pid); -- 2.26.2
[PATCH v5 2/8] powerpc/watchpoint: Fix handling of vector instructions
Vector instructions are special because they are always aligned. Thus unaligned EA needs to be aligned down before comparing it with watch ranges. Otherwise we might consider valid event as invalid. Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 9f7df1c37233..f6b24838ca3c 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -644,6 +644,8 @@ static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, if (*type == CACHEOP) { *size = cache_op_size(); *ea &= ~(*size - 1); + } else if (*type == LOAD_VMX || *type == STORE_VMX) { + *ea &= ~(*size - 1); } } -- 2.26.2
[PATCH v5 1/8] powerpc/watchpoint: Fix quarword instruction handling on p10 predecessors
On p10 predecessors, watchpoint with quarword access is compared at quardword length. If the watch range is doubleword or less than that in a first half of quarword aligned 16 bytes, and if there is any unaligned quadword access which will access only the 2nd half, the handler should consider it as extraneous and emulate/single-step it before continuing. Reported-by: Pedro Miraglia Franco de Carvalho Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 3 ++- arch/powerpc/kernel/hw_breakpoint.c | 12 ++-- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index db206a7f38e2..da38e05e04d9 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -40,7 +40,8 @@ struct arch_hw_breakpoint { #ifdef CONFIG_PPC_8xx #define HW_BREAKPOINT_SIZE 0x4 #else -#define HW_BREAKPOINT_SIZE 0x8 +#define HW_BREAKPOINT_SIZE 0x8 +#define HW_BREAKPOINT_SIZE_QUADWORD0x10 #endif #define DABR_MAX_LEN 8 diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 1f4a1efa0074..9f7df1c37233 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -520,9 +520,17 @@ static bool ea_hw_range_overlaps(unsigned long ea, int size, struct arch_hw_breakpoint *info) { unsigned long hw_start_addr, hw_end_addr; + unsigned long align_size = HW_BREAKPOINT_SIZE; - hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE); - hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); + /* +* On p10 predecessors, quadword is handle differently then +* other instructions. +*/ + if (!cpu_has_feature(CPU_FTR_ARCH_31) && size == 16) + align_size = HW_BREAKPOINT_SIZE_QUADWORD; + + hw_start_addr = ALIGN_DOWN(info->address, align_size); + hw_end_addr = ALIGN(info->address + info->len, align_size); return ((ea < hw_end_addr) && (ea + size > hw_start_addr)); } -- 2.26.2
[PATCH v5 0/8] powerpc/watchpoint: Bug fixes plus new feature flag
Patch #1 fixes issue for quardword instruction on p10 predecessors. Patch #2 fixes issue for vector instructions. Patch #3 fixes a bug about watchpoint not firing when created with ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N. The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I guess, should be fine because we don't leak any kernel addresses and PRIV_ALL will also help to cover scenarios when kernel accesses user memory. Patch #4,#5 fixes infinite exception bug, again the bug happens only with CONFIG_HAVE_HW_BREAKPOINT=N. Patch #6 fixes two places where we are missing to set hw_len. Patch #7 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 which will be set when running on ISA 3.1 compliant machine. Patch #8 finally adds selftest to test scenarios fixed by patch#2,#3 and also moves MODE_EXACT tests outside of BP_RANGE condition. Christophe, let me know if this series breaks something for 8xx. v4: https://lore.kernel.org/r/20200817102330.777537-1-ravi.bango...@linux.ibm.com/ v4->v5: - Patch #1 and #2 are new. These bug happen irrespective of CONFIG_HAVE_HW_BREAKPOINT. - Patch #3 to #8 are carry forwarded from v4 - Rebased to powerpc/next Ravi Bangoria (8): powerpc/watchpoint: Fix quarword instruction handling on p10 predecessors powerpc/watchpoint: Fix handling of vector instructions powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N powerpc/watchpoint: Add hw_len wherever missing powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 powerpc/watchpoint/selftests: Tests for kernel accessing user memory Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/asm/hw_breakpoint.h | 14 +- arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/hw_breakpoint.c | 149 +--- .../kernel/hw_breakpoint_constraints.c| 162 ++ arch/powerpc/kernel/process.c | 48 ++ arch/powerpc/kernel/ptrace/ptrace-noadv.c | 10 +- arch/powerpc/xmon/xmon.c | 1 + .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 +- 10 files changed, 285 insertions(+), 152 deletions(-) create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c -- 2.26.2
[PATCH v4 5/6] powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 can be used to determine whether we are running on an ISA 3.1 compliant machine. Which is needed to determine DAR behaviour, 512 byte boundary limit etc. This was requested by Pedro Miraglia Franco de Carvalho for extending watchpoint features in gdb. Note that availability of 2nd DAWR is independent of this flag and should be checked using ppc_debug_info->num_data_bps. Signed-off-by: Ravi Bangoria --- Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 ++ 3 files changed, 4 insertions(+) diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst index 864d4b61..77725d69eb4a 100644 --- a/Documentation/powerpc/ptrace.rst +++ b/Documentation/powerpc/ptrace.rst @@ -46,6 +46,7 @@ features will have bits indicating whether there is support for:: #define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10 + #define PPC_DEBUG_FEATURE_DATA_BP_ARCH_310x20 2. PTRACE_SETHWDEBUG diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h index f5f1ccc740fc..7004cfea3f5f 100644 --- a/arch/powerpc/include/uapi/asm/ptrace.h +++ b/arch/powerpc/include/uapi/asm/ptrace.h @@ -222,6 +222,7 @@ struct ppc_debug_info { #define PPC_DEBUG_FEATURE_DATA_BP_RANGE0x0004 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x0008 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x0010 +#define PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 0x0020 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 081c39842d84..1d0235db3c1b 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -57,6 +57,8 @@ void ppc_gethwdinfo(struct ppc_debug_info *dbginfo) } else { dbginfo->features = 0; } + if (cpu_has_feature(CPU_FTR_ARCH_31)) + dbginfo->features |= PPC_DEBUG_FEATURE_DATA_BP_ARCH_31; } int ptrace_get_debugreg(struct task_struct *child, unsigned long addr, -- 2.26.2
[PATCH v4 3/6] powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N
On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel disables event every time it fires and user has to re-enable it. Also, in case of ptrace watchpoint, kernel notifies ptrace user before executing instruction. With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable ptrace event and thus it's causing infinite loop of exceptions. This is especially harmful when user watches on a data which is also read/written by kernel, eg syscall parameters. In such case, infinite exceptions happens in kernel mode which causes soft-lockup. Fixes: 9422de3e953d ("powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers") Reported-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 3 ++ arch/powerpc/kernel/process.c | 48 +++ arch/powerpc/kernel/ptrace/ptrace-noadv.c | 5 +++ 3 files changed, 56 insertions(+) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index f71f08a7e2e0..90d5b3a9f433 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -18,6 +18,7 @@ struct arch_hw_breakpoint { u16 type; u16 len; /* length of the target data symbol */ u16 hw_len; /* length programmed in hw */ + u8 flags; }; /* Note: Don't change the first 6 bits below as they are in the same order @@ -37,6 +38,8 @@ struct arch_hw_breakpoint { #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \ HW_BRK_TYPE_HYP) +#define HW_BRK_FLAG_DISABLED 0x1 + /* Minimum granularity */ #ifdef CONFIG_PPC_8xx #define HW_BREAKPOINT_SIZE 0x4 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index e4fcc817c46c..cab6febe6eb6 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -636,6 +636,44 @@ void do_send_trap(struct pt_regs *regs, unsigned long address, (void __user *)address); } #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ + +static void do_break_handler(struct pt_regs *regs) +{ + struct arch_hw_breakpoint null_brk = {0}; + struct arch_hw_breakpoint *info; + struct ppc_inst instr = ppc_inst(0); + int type = 0; + int size = 0; + unsigned long ea; + int i; + + /* +* If underneath hw supports only one watchpoint, we know it +* caused exception. 8xx also falls into this category. +*/ + if (nr_wp_slots() == 1) { + __set_breakpoint(0, _brk); + current->thread.hw_brk[0] = null_brk; + current->thread.hw_brk[0].flags |= HW_BRK_FLAG_DISABLED; + return; + } + + /* Otherwise findout which DAWR caused exception and disable it. */ + wp_get_instr_detail(regs, , , , ); + + for (i = 0; i < nr_wp_slots(); i++) { + info = >thread.hw_brk[i]; + if (!info->address) + continue; + + if (wp_check_constraints(regs, instr, ea, type, size, info)) { + __set_breakpoint(i, _brk); + current->thread.hw_brk[i] = null_brk; + current->thread.hw_brk[i].flags |= HW_BRK_FLAG_DISABLED; + } + } +} + void do_break (struct pt_regs *regs, unsigned long address, unsigned long error_code) { @@ -647,6 +685,16 @@ void do_break (struct pt_regs *regs, unsigned long address, if (debugger_break_match(regs)) return; + /* +* We reach here only when watchpoint exception is generated by ptrace +* event (or hw is buggy!). Now if CONFIG_HAVE_HW_BREAKPOINT is set, +* watchpoint is already handled by hw_breakpoint_handler() so we don't +* have to do anything. But when CONFIG_HAVE_HW_BREAKPOINT is not set, +* we need to manually handle the watchpoint here. +*/ + if (!IS_ENABLED(CONFIG_HAVE_HW_BREAKPOINT)) + do_break_handler(regs); + /* Deliver the signal to userspace */ force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)address); } diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 57a0ab822334..866597b407bc 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -286,11 +286,16 @@ long ppc_del_hwdebug(struct task_struct *child, long data) } return ret; #else /* CONFIG_HAVE_HW_BREAKPOINT */ + if (child->thread.hw_brk[data - 1].flags & HW_BRK_FLAG_DISABLED) + goto del; + if (child->thread.hw_brk[data - 1].address == 0) return -ENOENT; +del: child->thread.hw_brk[data - 1].address = 0;
[PATCH v4 2/6] powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c
Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused the exception. So we have a sw logic to detect that in hw_breakpoint.c. But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y. Move DAWR detection logic outside of hw_breakpoint.c so that it can be reused when CONFIG_HAVE_HW_BREAKPOINT is not set. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hw_breakpoint.h | 8 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/hw_breakpoint.c | 149 + .../kernel/hw_breakpoint_constraints.c| 152 ++ 4 files changed, 164 insertions(+), 148 deletions(-) create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index db206a7f38e2..f71f08a7e2e0 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -10,6 +10,7 @@ #define _PPC_BOOK3S_64_HW_BREAKPOINT_H #include +#include #ifdef __KERNEL__ struct arch_hw_breakpoint { @@ -51,6 +52,13 @@ static inline int nr_wp_slots(void) return cpu_has_feature(CPU_FTR_DAWR1) ? 2 : 1; } +bool wp_check_constraints(struct pt_regs *regs, struct ppc_inst instr, + unsigned long ea, int type, int size, + struct arch_hw_breakpoint *info); + +void wp_get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, +int *type, int *size, unsigned long *ea); + #ifdef CONFIG_HAVE_HW_BREAKPOINT #include #include diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index d4d5946224f8..286f85a103de 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -45,7 +45,8 @@ obj-y := cputable.o syscalls.o \ signal.o sysfs.o cacheinfo.o time.o \ prom.o traps.o setup-common.o \ udbg.o misc.o io.o misc_$(BITS).o \ - of_platform.o prom_parse.o firmware.o + of_platform.o prom_parse.o firmware.o \ + hw_breakpoint_constraints.o obj-y += ptrace/ obj-$(CONFIG_PPC64)+= setup_64.o \ paca.o nvram_64.o note.o syscall_64.o diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 1f4a1efa0074..f4e8f21046f5 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -494,151 +494,6 @@ void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) } } -static bool dar_in_user_range(unsigned long dar, struct arch_hw_breakpoint *info) -{ - return ((info->address <= dar) && (dar - info->address < info->len)); -} - -static bool ea_user_range_overlaps(unsigned long ea, int size, - struct arch_hw_breakpoint *info) -{ - return ((ea < info->address + info->len) && - (ea + size > info->address)); -} - -static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info) -{ - unsigned long hw_start_addr, hw_end_addr; - - hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE); - hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); - - return ((hw_start_addr <= dar) && (hw_end_addr > dar)); -} - -static bool ea_hw_range_overlaps(unsigned long ea, int size, -struct arch_hw_breakpoint *info) -{ - unsigned long hw_start_addr, hw_end_addr; - - hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE); - hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); - - return ((ea < hw_end_addr) && (ea + size > hw_start_addr)); -} - -/* - * If hw has multiple DAWR registers, we also need to check all - * dawrx constraint bits to confirm this is _really_ a valid event. - * If type is UNKNOWN, but privilege level matches, consider it as - * a positive match. - */ -static bool check_dawrx_constraints(struct pt_regs *regs, int type, - struct arch_hw_breakpoint *info) -{ - if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ)) - return false; - - /* -* The Cache Management instructions other than dcbz never -* cause a match. i.e. if type is CACHEOP, the instruction -* is dcbz, and dcbz is treated as Store. -*/ - if ((OP_IS_STORE(type) || type == CACHEOP) && !(info->type & HW_BRK_TYPE_WRITE)) - return false; - - if (is_kernel_addr(regs->nip) && !(info->type & HW_BRK_TYPE_KERNEL)) - return false;
[PATCH v4 1/6] powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N
When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited functionalities. But, such watchpoints are never firing because of the missing privilege settings. Fix that. It's safe to set HW_BRK_TYPE_PRIV_ALL because we don't really leak any kernel address in signal info. Setting HW_BRK_TYPE_PRIV_ALL will also help to find scenarios when kernel corrupts user memory. Reported-by: Pedro Miraglia Franco de Carvalho Suggested-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 697c7e4b5877..57a0ab822334 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -217,7 +217,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf return -EIO; brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE); - brk.type = HW_BRK_TYPE_TRANSLATE; + brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL; brk.len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) brk.type |= HW_BRK_TYPE_READ; -- 2.26.2
[PATCH v4 0/6] powerpc/watchpoint: Bug fixes plus new feature flag
Patch #1 fixes a bug about watchpoint not firing when created with ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N. The fix uses HW_BRK_TYPE_PRIV_ALL for ptrace user which, I guess, should be fine because we don't leak any kernel addresses and PRIV_ALL will also help to cover scenarios when kernel accesses user memory. Patch #2,#3 fixes infinite exception bug, again the bug happens only with CONFIG_HAVE_HW_BREAKPOINT=N. Patch #4 fixes two places where we are missing to set hw_len. Patch #5 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 which will be set when running on ISA 3.1 compliant machine. Patch #6 finally adds selftest to test scenarios fixed by patch#2,#3 and also moves MODE_EXACT tests outside of BP_RANGE condition. Christophe, let me know if this series breaks something for 8xx. v3: https://lore.kernel.org/r/20200805062750.290289-1-ravi.bango...@linux.ibm.com v3->v4: - Patch #1. Allow HW_BRK_TYPE_PRIV_ALL instead of HW_BRK_TYPE_USER - Patch #2, #3, #4 and #6 are new. - Rebased to powerpc/next. Ravi Bangoria (6): powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N powerpc/watchpoint: Add hw_len wherever missing powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 powerpc/watchpoint/selftests: Tests for kernel accessing user memory Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/asm/hw_breakpoint.h | 11 ++ arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/hw_breakpoint.c | 149 + .../kernel/hw_breakpoint_constraints.c| 152 ++ arch/powerpc/kernel/process.c | 48 ++ arch/powerpc/kernel/ptrace/ptrace-noadv.c | 10 +- arch/powerpc/xmon/xmon.c | 1 + .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 +- 10 files changed, 273 insertions(+), 151 deletions(-) create mode 100644 arch/powerpc/kernel/hw_breakpoint_constraints.c -- 2.26.2
[PATCH v4 6/6] powerpc/watchpoint/selftests: Tests for kernel accessing user memory
Introduce tests to cover simple scenarios where user is watching memory which can be accessed by kernel as well. We also support _MODE_EXACT with _SETHWDEBUG interface. Move those testcases out- side of _BP_RANGE condition. This will help to test _MODE_EXACT scenarios when CONFIG_HAVE_HW_BREAKPOINT is not set, eg: $ ./ptrace-hwbreak ... PTRACE_SET_DEBUGREG, Kernel Access Userspace, len: 8: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace, len: 1: Ok success: ptrace-hwbreak Suggested-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 48 ++- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index fc477dfe86a2..2e0d86e0687e 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -20,6 +20,8 @@ #include #include #include +#include +#include #include "ptrace.h" #define SPRN_PVR 0x11F @@ -44,6 +46,7 @@ struct gstruct { }; static volatile struct gstruct gstruct __attribute__((aligned(512))); +static volatile char cwd[PATH_MAX] __attribute__((aligned(8))); static void get_dbginfo(pid_t child_pid, struct ppc_debug_info *dbginfo) { @@ -138,6 +141,9 @@ static void test_workload(void) write_var(len); } + /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */ + syscall(__NR_getcwd, , PATH_MAX); + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO test */ write_var(1); @@ -150,6 +156,9 @@ static void test_workload(void) else read_var(1); + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */ + syscall(__NR_getcwd, , PATH_MAX); + /* PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO test */ gstruct.a[rand() % A_LEN] = 'a'; @@ -293,6 +302,24 @@ static int test_set_debugreg(pid_t child_pid) return 0; } +static int test_set_debugreg_kernel_userspace(pid_t child_pid) +{ + unsigned long wp_addr = (unsigned long)cwd; + char *name = "PTRACE_SET_DEBUGREG"; + + /* PTRACE_SET_DEBUGREG, Kernel Access Userspace test */ + wp_addr &= ~0x7UL; + wp_addr |= (1Ul << DABR_READ_SHIFT); + wp_addr |= (1UL << DABR_WRITE_SHIFT); + wp_addr |= (1UL << DABR_TRANSLATION_SHIFT); + ptrace_set_debugreg(child_pid, wp_addr); + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "Kernel Access Userspace", wp_addr, 8); + + ptrace_set_debugreg(child_pid, 0); + return 0; +} + static void get_ppc_hw_breakpoint(struct ppc_hw_breakpoint *info, int type, unsigned long addr, int len) { @@ -338,6 +365,22 @@ static void test_sethwdebug_exact(pid_t child_pid) ptrace_delhwdebug(child_pid, wh); } +static void test_sethwdebug_exact_kernel_userspace(pid_t child_pid) +{ + struct ppc_hw_breakpoint info; + unsigned long wp_addr = (unsigned long) + char *name = "PPC_PTRACE_SETHWDEBUG, MODE_EXACT"; + int len = 1; /* hardcoded in kernel */ + int wh; + + /* PPC_PTRACE_SETHWDEBUG, MODE_EXACT, Kernel Access Userspace test */ + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr, 0); + wh = ptrace_sethwdebug(child_pid, ); + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "Kernel Access Userspace", wp_addr, len); + ptrace_delhwdebug(child_pid, wh); +} + static void test_sethwdebug_range_aligned(pid_t child_pid) { struct ppc_hw_breakpoint info; @@ -452,9 +495,10 @@ static void run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, bool dawr) { test_set_debugreg(child_pid); + test_set_debugreg_kernel_userspace(child_pid); + test_sethwdebug_exact(child_pid); + test_sethwdebug_exact_kernel_userspace(child_pid); if (dbginfo->features & PPC_DEBUG_FEATURE_DATA_BP_RANGE) { - test_sethwdebug_exact(child_pid); - test_sethwdebug_range_aligned(child_pid); if (dawr || is_8xx) { test_sethwdebug_range_unaligned(child_pid); -- 2.26.2
[PATCH v4 4/6] powerpc/watchpoint: Add hw_len wherever missing
There are couple of places where we set len but not hw_len. For ptrace/perf watchpoints, when CONFIG_HAVE_HW_BREAKPOINT=Y, hw_len will be calculated and set internally while parsing watchpoint. But when CONFIG_HAVE_HW_BREAKPOINT=N, we need to manually set 'hw_len'. Similarly for xmon as well, hw_len needs to be set directly. Fixes: b57aeab811db ("powerpc/watchpoint: Fix length calculation for unaligned target") Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 1 + arch/powerpc/xmon/xmon.c | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 866597b407bc..081c39842d84 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -219,6 +219,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE); brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_PRIV_ALL; brk.len = DABR_MAX_LEN; + brk.hw_len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) brk.type |= HW_BRK_TYPE_READ; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index df7bca00f5ec..55c43a6c9111 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -969,6 +969,7 @@ static void insert_cpu_bpts(void) brk.address = dabr[i].address; brk.type = (dabr[i].enabled & HW_BRK_TYPE_DABR) | HW_BRK_TYPE_PRIV_ALL; brk.len = 8; + brk.hw_len = 8; __set_breakpoint(i, ); } } -- 2.26.2
[PATCH v3 0/2] powerpc/watchpoint/ptrace: Buf fix plus new feature flag
Patch #1 fixes a bug when watchpoint is created with ptrace PPC_PTRACE_SETHWDEBUG and CONFIG_HAVE_HW_BREAKPOINT=N. patch #2 introduce new feature bit PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 which will be set when running on ISA 3.1 compliant machine. v2: https://lore.kernel.org/r/20200723093330.306341-1-ravi.bango...@linux.ibm.com v2->v3: - #1: is new - #2: Set PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 unconditionally while running on p10. - Rebased to powerpc/next (cf1ae052e073) Ravi Bangoria (2): powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/ptrace/ptrace-noadv.c | 4 +++- 3 files changed, 5 insertions(+), 1 deletion(-) -- 2.26.2
[PATCH v3 2/2] powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31
PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 can be used to determine whether we are running on an ISA 3.1 compliant machine. Which is needed to determine DAR behaviour, 512 byte boundary limit etc. This was requested by Pedro Miraglia Franco de Carvalho for extending watchpoint features in gdb. Note that availability of 2nd DAWR is independent of this flag and should be checked using ppc_debug_info->num_data_bps. Signed-off-by: Ravi Bangoria --- Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 ++ 3 files changed, 4 insertions(+) diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst index 864d4b61..77725d69eb4a 100644 --- a/Documentation/powerpc/ptrace.rst +++ b/Documentation/powerpc/ptrace.rst @@ -46,6 +46,7 @@ features will have bits indicating whether there is support for:: #define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10 + #define PPC_DEBUG_FEATURE_DATA_BP_ARCH_310x20 2. PTRACE_SETHWDEBUG diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h index f5f1ccc740fc..7004cfea3f5f 100644 --- a/arch/powerpc/include/uapi/asm/ptrace.h +++ b/arch/powerpc/include/uapi/asm/ptrace.h @@ -222,6 +222,7 @@ struct ppc_debug_info { #define PPC_DEBUG_FEATURE_DATA_BP_RANGE0x0004 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x0008 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x0010 +#define PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 0x0020 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index bf94ab837735..d5be73ab9e74 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -57,6 +57,8 @@ void ppc_gethwdinfo(struct ppc_debug_info *dbginfo) } else { dbginfo->features = 0; } + if (cpu_has_feature(CPU_FTR_ARCH_31)) + dbginfo->features |= PPC_DEBUG_FEATURE_DATA_BP_ARCH_31; } int ptrace_get_debugreg(struct task_struct *child, unsigned long addr, -- 2.26.2
[PATCH v3 1/2] powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N
When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited functionalities. But, such watchpoints are never firing because of the missing privilege settings. Fix that. Reported-by: Pedro Miraglia Franco de Carvalho Suggested-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/ptrace/ptrace-noadv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 697c7e4b5877..bf94ab837735 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -217,7 +217,7 @@ long ppc_set_hwdebug(struct task_struct *child, struct ppc_hw_breakpoint *bp_inf return -EIO; brk.address = ALIGN_DOWN(bp_info->addr, HW_BREAKPOINT_SIZE); - brk.type = HW_BRK_TYPE_TRANSLATE; + brk.type = HW_BRK_TYPE_TRANSLATE | HW_BRK_TYPE_USER; brk.len = DABR_MAX_LEN; if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_READ) brk.type |= HW_BRK_TYPE_READ; -- 2.26.2
Re: [PATCH 0/7] powerpc/watchpoint: 2nd DAWR kvm enablement + selftests
On 7/23/20 3:50 PM, Ravi Bangoria wrote: Patch #1, #2 and #3 enables p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it. A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has been added to query 2nd DAWR support via kvm ioctl. This feature also needs to be enabled in Qemu to really use it. I'll reply link to qemu patches once I post them in qemu-devel mailing list. Qemu patches: https://lore.kernel.org/kvm/20200723104220.314671-1-ravi.bango...@linux.ibm.com
[PATCH 2/7] powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR
kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria --- Documentation/virt/kvm/api.rst| 2 ++ arch/powerpc/include/asm/hvcall.h | 2 ++ arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/include/uapi/asm/kvm.h | 4 +++ arch/powerpc/kernel/asm-offsets.c | 2 ++ arch/powerpc/kvm/book3s_hv.c | 41 +++ arch/powerpc/kvm/book3s_hv_nested.c | 7 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 23 + tools/arch/powerpc/include/uapi/asm/kvm.h | 4 +++ 9 files changed, 87 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 4dc18fe6a2bf..7b1d16c2ad24 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2242,6 +2242,8 @@ registers, find a list below: PPC KVM_REG_PPC_PSSCR 64 PPC KVM_REG_PPC_DEC_EXPIRY 64 PPC KVM_REG_PPC_PTCR64 + PPC KVM_REG_PPC_DAWR1 64 + PPC KVM_REG_PPC_DAWRX1 64 PPC KVM_REG_PPC_TM_GPR0 64 ... PPC KVM_REG_PPC_TM_GPR3164 diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 33793444144c..03f401d7be41 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -538,6 +538,8 @@ struct hv_guest_state { s64 tb_offset; u64 dawr0; u64 dawrx0; + u64 dawr1; + u64 dawrx1; u64 ciabr; u64 hdec_expiry; u64 purr; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 9aa3854f0e1e..bda839edd5fe 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -584,6 +584,8 @@ struct kvm_vcpu_arch { ulong dabr; ulong dawr0; ulong dawrx0; + ulong dawr1; + ulong dawrx1; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 38d61b73f5ed..c5c0f128b46f 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -640,6 +640,10 @@ struct kvm_ppc_cpu_char { #define KVM_REG_PPC_ONLINE (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf) #define KVM_REG_PPC_PTCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc0) +/* POWER10 registers. */ +#define KVM_REG_PPC_DAWR1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1) +#define KVM_REG_PPC_DAWRX1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2) + /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index e76bffe348e1..ef2c0f3f5a7b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -549,6 +549,8 @@ int main(void) OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); + OFFSET(VCPU_DAWR1, kvm_vcpu, arch.dawr1); + OFFSET(VCPU_DAWRX1, kvm_vcpu, arch.dawrx1); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 28200e4f5d27..24575520b2ea 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -781,6 +781,20 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, vcpu->arch.dawr0 = value1; vcpu->arch.dawrx0 = value2; return H_SUCCESS; + case H_SET_MODE_RESOURCE_SET_DAWR1: + if (!kvmppc_power8_compatible(vcpu)) + return H_P2; + if (!ppc_breakpoint_available()) + return H_P2; + if (!cpu_has_feature(CPU_FTR_DAWR1)) + return H_P2; + if (mflags) + return H_UNSUPPORTED_FLAG_START; + if (value2 & DABRX_HYP) + return H_P4; + vcpu->arch.dawr1 = value1; + vcpu->arch.dawrx1 = value2; + return H_SUCCESS; case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE: /* KVM does not support mflags=2 (AIL=2) */ if (mflags != 0 && mflags != 3) @@ -1730,6 +1744,12 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_DAWRX0: *val = get_reg_val(id, vcpu->arch.dawrx0); break; + case KVM_REG_PPC_DAWR1: +
[PATCH 7/7] powerpc/selftests: Add selftest to test concurrent perf/ptrace events
ptrace and perf watchpoints can't co-exists if their address range overlaps. See commit 29da4f91c0c1 ("powerpc/watchpoint: Don't allow concurrent perf and ptrace events") for more detail. Add selftest for the same. Sample o/p: # ./ptrace-perf-hwbreak test: ptrace-perf-hwbreak tags: git_version:powerpc-5.8-7-118-g937fa174a15d-dirty perf cpu event -> ptrace thread event (Overlapping): Ok perf cpu event -> ptrace thread event (Non-overlapping): Ok perf thread event -> ptrace same thread event (Overlapping): Ok perf thread event -> ptrace same thread event (Non-overlapping): Ok perf thread event -> ptrace other thread event: Ok ptrace thread event -> perf kernel event: Ok ptrace thread event -> perf same thread event (Overlapping): Ok ptrace thread event -> perf same thread event (Non-overlapping): Ok ptrace thread event -> perf other thread event: Ok ptrace thread event -> perf cpu event (Overlapping): Ok ptrace thread event -> perf cpu event (Non-overlapping): Ok ptrace thread event -> perf same thread & cpu event (Overlapping): Ok ptrace thread event -> perf same thread & cpu event (Non-overlapping): Ok ptrace thread event -> perf other thread & cpu event: Ok success: ptrace-perf-hwbreak Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/.gitignore | 1 + .../testing/selftests/powerpc/ptrace/Makefile | 2 +- .../powerpc/ptrace/ptrace-perf-hwbreak.c | 659 ++ 3 files changed, 661 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c diff --git a/tools/testing/selftests/powerpc/ptrace/.gitignore b/tools/testing/selftests/powerpc/ptrace/.gitignore index 0e96150b7c7e..eb75e5360e31 100644 --- a/tools/testing/selftests/powerpc/ptrace/.gitignore +++ b/tools/testing/selftests/powerpc/ptrace/.gitignore @@ -14,3 +14,4 @@ perf-hwbreak core-pkey ptrace-pkey ptrace-syscall +ptrace-perf-hwbreak diff --git a/tools/testing/selftests/powerpc/ptrace/Makefile b/tools/testing/selftests/powerpc/ptrace/Makefile index 8d3f006c98cc..a500639da97a 100644 --- a/tools/testing/selftests/powerpc/ptrace/Makefile +++ b/tools/testing/selftests/powerpc/ptrace/Makefile @@ -2,7 +2,7 @@ TEST_GEN_PROGS := ptrace-gpr ptrace-tm-gpr ptrace-tm-spd-gpr \ ptrace-tar ptrace-tm-tar ptrace-tm-spd-tar ptrace-vsx ptrace-tm-vsx \ ptrace-tm-spd-vsx ptrace-tm-spr ptrace-hwbreak ptrace-pkey core-pkey \ - perf-hwbreak ptrace-syscall + perf-hwbreak ptrace-syscall ptrace-perf-hwbreak top_srcdir = ../../../../.. include ../../lib.mk diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c new file mode 100644 index ..6b8804a4942e --- /dev/null +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c @@ -0,0 +1,659 @@ +// SPDX-License-Identifier: GPL-2.0+ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "ptrace.h" + +char data[16]; + +/* Overlapping address range */ +volatile __u64 *ptrace_data1 = (__u64 *)[0]; +volatile __u64 *perf_data1 = (__u64 *)[4]; + +/* Non-overlapping address range */ +volatile __u64 *ptrace_data2 = (__u64 *)[0]; +volatile __u64 *perf_data2 = (__u64 *)[8]; + +static unsigned long pid_max_addr(void) +{ + FILE *fp; + char *line, *c; + char addr[100]; + size_t len = 0; + + fp = fopen("/proc/kallsyms", "r"); + if (!fp) { + printf("Failed to read /proc/kallsyms. Exiting..\n"); + exit(EXIT_FAILURE); + } + + while (getline(, , fp) != -1) { + if (!strstr(line, "pid_max") || strstr(line, "pid_max_max") || + strstr(line, "pid_max_min")) + continue; + + strncpy(addr, line, len < 100 ? len : 100); + c = strchr(addr, ' '); + *c = '\0'; + return strtoul(addr, , 16); + } + fclose(fp); + printf("Could not find pix_max. Exiting..\n"); + exit(EXIT_FAILURE); + return -1; +} + +static void perf_user_event_attr_set(struct perf_event_attr *attr, __u64 addr, __u64 len) +{ + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_TYPE_BREAKPOINT; + attr->size = sizeof(struct perf_event_attr); + attr->bp_type= HW_BREAKPOINT_R; + attr->bp_addr= addr; + attr->bp_len = len; + attr->exclude_kernel = 1; + attr->exclude_hv = 1; +} + +static void perf_kernel_event_attr_set(struct perf_event_attr *attr) +{ + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_
[PATCH 5/7] powerpc/selftests/perf-hwbreak: Coalesce event creation code
perf-hwbreak selftest opens hw-breakpoint event at multiple places for which it has same code repeated. Coalesce that code into a function. Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/perf-hwbreak.c | 78 +-- 1 file changed, 38 insertions(+), 40 deletions(-) diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c index c1f324afdbf3..bde475341c8a 100644 --- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c @@ -34,28 +34,46 @@ #define DAWR_LENGTH_MAX ((0x3f + 1) * 8) -static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid, - int cpu, int group_fd, - unsigned long flags) +static void perf_event_attr_set(struct perf_event_attr *attr, + __u32 type, __u64 addr, __u64 len, + bool exclude_user) { - attr->size = sizeof(*attr); - return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_TYPE_BREAKPOINT; + attr->size = sizeof(struct perf_event_attr); + attr->bp_type= type; + attr->bp_addr= addr; + attr->bp_len = len; + attr->exclude_kernel = 1; + attr->exclude_hv = 1; + attr->exclude_guest = 1; + attr->exclude_user = exclude_user; + attr->disabled = 1; } -static inline bool breakpoint_test(int len) +static int +perf_process_event_open_exclude_user(__u32 type, __u64 addr, __u64 len, bool exclude_user) { struct perf_event_attr attr; + + perf_event_attr_set(, type, addr, len, exclude_user); + return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0); +} + +static int perf_process_event_open(__u32 type, __u64 addr, __u64 len) +{ + struct perf_event_attr attr; + + perf_event_attr_set(, type, addr, len, 0); + return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0); +} + +static inline bool breakpoint_test(int len) +{ int fd; - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.bp_type = HW_BREAKPOINT_R; /* bp_addr can point anywhere but needs to be aligned */ - attr.bp_addr = (__u64)() & 0xf800; - attr.bp_len = len; - fd = sys_perf_event_open(, 0, -1, -1, 0); + fd = perf_process_event_open(HW_BREAKPOINT_R, (__u64)() & 0xf800, len); if (fd < 0) return false; close(fd); @@ -75,7 +93,6 @@ static inline bool dawr_supported(void) static int runtestsingle(int readwriteflag, int exclude_user, int arraytest) { int i,j; - struct perf_event_attr attr; size_t res; unsigned long long breaks, needed; int readint; @@ -94,19 +111,11 @@ static int runtestsingle(int readwriteflag, int exclude_user, int arraytest) if (arraytest) ptr = [0]; - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.bp_type = readwriteflag; - attr.bp_addr = (__u64)ptr; - attr.bp_len = sizeof(int); - if (arraytest) - attr.bp_len = DAWR_LENGTH_MAX; - attr.exclude_user = exclude_user; - break_fd = sys_perf_event_open(, 0, -1, -1, 0); + break_fd = perf_process_event_open_exclude_user(readwriteflag, (__u64)ptr, + arraytest ? DAWR_LENGTH_MAX : sizeof(int), + exclude_user); if (break_fd < 0) { - perror("sys_perf_event_open"); + perror("perf_process_event_open_exclude_user"); exit(1); } @@ -153,7 +162,6 @@ static int runtest_dar_outside(void) void *target; volatile __u16 temp16; volatile __u64 temp64; - struct perf_event_attr attr; int break_fd; unsigned long long breaks; int fail = 0; @@ -165,21 +173,11 @@ static int runtest_dar_outside(void) exit(EXIT_FAILURE); } - /* setup counters */ - memset(, 0, sizeof(attr)); - attr.disabled = 1; - attr.type = PERF_TYPE_BREAKPOINT; - attr.exclude_kernel = 1; - attr.exclude_hv = 1; - attr.exclude_guest = 1; - attr.bp_type = HW_BREAKPOINT_RW; /* watch middle half of target array */ - attr.bp_addr = (__u64)(target + 2); - attr.bp_len = 4; - break_fd = sys_perf_event_open(, 0, -1, -1, 0); + break_fd = perf_process_event_open(HW_BREAKPOINT_RW, (__u64)(target + 2), 4); if (break_fd < 0) {
[PATCH 0/7] powerpc/watchpoint: 2nd DAWR kvm enablement + selftests
Patch #1, #2 and #3 enables p10 2nd DAWR feature for Book3S kvm guest. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/unset it. A new case H_SET_MODE_RESOURCE_SET_DAWR1 is introduced in H_SET_MODE hcall for setting/unsetting 2nd DAWR. Also, new capability KVM_CAP_PPC_DAWR1 has been added to query 2nd DAWR support via kvm ioctl. This feature also needs to be enabled in Qemu to really use it. I'll reply link to qemu patches once I post them in qemu-devel mailing list. Patch #4, #5, #6 and #7 adds selftests to test 2nd DAWR. Dependency: 1: p10 kvm base enablement https://lore.kernel.org/linuxppc-dev/20200602055325.6102-1-alist...@popple.id.au 2: 2nd DAWR powervm/baremetal enablement https://lore.kernel.org/linuxppc-dev/20200723090813.303838-1-ravi.bango...@linux.ibm.com 3: ptrace PPC_DEBUG_FEATURE_DATA_BP_DAWR_ARCH_31 flag https://lore.kernel.org/linuxppc-dev/20200723093330.306341-1-ravi.bango...@linux.ibm.com Patches in this series applies fine on top of powerpc/next (9a77c4a0a125) plus above dependency patches. Ravi Bangoria (7): powerpc/watchpoint/kvm: Rename current DAWR macros and variables powerpc/watchpoint/kvm: Add infrastructure to support 2nd DAWR powerpc/watchpoint/kvm: Introduce new capability for 2nd DAWR powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR powerpc/selftests/perf-hwbreak: Coalesce event creation code powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR powerpc/selftests: Add selftest to test concurrent perf/ptrace events Documentation/virt/kvm/api.rst| 6 +- arch/powerpc/include/asm/hvcall.h | 2 + arch/powerpc/include/asm/kvm_host.h | 6 +- arch/powerpc/include/uapi/asm/kvm.h | 8 +- arch/powerpc/kernel/asm-offsets.c | 6 +- arch/powerpc/kvm/book3s_hv.c | 73 +- arch/powerpc/kvm/book3s_hv_nested.c | 15 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +- arch/powerpc/kvm/powerpc.c| 3 + include/uapi/linux/kvm.h | 1 + tools/arch/powerpc/include/uapi/asm/kvm.h | 8 +- .../selftests/powerpc/ptrace/.gitignore | 1 + .../testing/selftests/powerpc/ptrace/Makefile | 2 +- .../selftests/powerpc/ptrace/perf-hwbreak.c | 646 +++-- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++ .../powerpc/ptrace/ptrace-perf-hwbreak.c | 659 ++ 16 files changed, 1476 insertions(+), 82 deletions(-) create mode 100644 tools/testing/selftests/powerpc/ptrace/ptrace-perf-hwbreak.c -- 2.26.2
[PATCH 4/7] powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
Add selftests to test multiple active DAWRs with ptrace interface. Sample o/p: $ ./ptrace-hwbreak ... PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO, len: 6: Ok PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO, len: 6: Ok Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 79 +++ 1 file changed, 79 insertions(+) diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index fc477dfe86a2..65781f4035c1 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -185,6 +185,18 @@ static void test_workload(void) big_var[rand() % DAWR_MAX_LEN] = 'a'; else cvar = big_var[rand() % DAWR_MAX_LEN]; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */ + gstruct.a[rand() % A_LEN] = 'a'; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */ + cvar = gstruct.b[rand() % B_LEN]; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */ + gstruct.a[rand() % A_LEN] = 'a'; + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */ + cvar = gstruct.a[rand() % A_LEN]; } static void check_success(pid_t child_pid, const char *name, const char *type, @@ -374,6 +386,69 @@ static void test_sethwdebug_range_aligned(pid_t child_pid) ptrace_delhwdebug(child_pid, wh); } +static void test_multi_sethwdebug_range(pid_t child_pid) +{ + struct ppc_hw_breakpoint info1, info2; + unsigned long wp_addr1, wp_addr2; + char *name1 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED"; + char *name2 = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED"; + int len1, len2; + int wh1, wh2; + + wp_addr1 = (unsigned long) + wp_addr2 = (unsigned long) + len1 = A_LEN; + len2 = B_LEN; + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, len1); + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, len2); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW ALIGNED, WO test */ + wh1 = ptrace_sethwdebug(child_pid, ); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DW UNALIGNED, RO test */ + wh2 = ptrace_sethwdebug(child_pid, ); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name1, "WO", wp_addr1, len1); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name2, "RO", wp_addr2, len2); + + ptrace_delhwdebug(child_pid, wh1); + ptrace_delhwdebug(child_pid, wh2); +} + +static void test_multi_sethwdebug_range_dawr_overlap(pid_t child_pid) +{ + struct ppc_hw_breakpoint info1, info2; + unsigned long wp_addr1, wp_addr2; + char *name = "PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap"; + int len1, len2; + int wh1, wh2; + + wp_addr1 = (unsigned long) + wp_addr2 = (unsigned long) + len1 = A_LEN; + len2 = A_LEN; + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_WRITE, wp_addr1, len1); + get_ppc_hw_breakpoint(, PPC_BREAKPOINT_TRIGGER_READ, wp_addr2, len2); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, WO test */ + wh1 = ptrace_sethwdebug(child_pid, ); + + /* PPC_PTRACE_SETHWDEBUG 2, MODE_RANGE, DAWR Overlap, RO test */ + wh2 = ptrace_sethwdebug(child_pid, ); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "WO", wp_addr1, len1); + + ptrace(PTRACE_CONT, child_pid, NULL, 0); + check_success(child_pid, name, "RO", wp_addr2, len2); + + ptrace_delhwdebug(child_pid, wh1); + ptrace_delhwdebug(child_pid, wh2); +} + static void test_sethwdebug_range_unaligned(pid_t child_pid) { struct ppc_hw_breakpoint info; @@ -460,6 +535,10 @@ run_tests(pid_t child_pid, struct ppc_debug_info *dbginfo, bool dawr) test_sethwdebug_range_unaligned(child_pid); test_sethwdebug_range_unaligned_dar(child_pid); test_sethwdebug_dawr_max_range(child_pid); + if (dbginfo->num_data_bps > 1) { + test_multi_sethwdebug_range(child_pid); + test_multi_sethwdebug_range_dawr_overlap(child_pid); + } } } } -- 2.26.2
[PATCH 6/7] powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR
Extend perf-hwbreak.c selftest to test multiple DAWRs. Also add testcase for testing 512 byte boundary removal. Sample o/p: # ./perf-hwbreak ... TESTED: Process specific, Two events, diff addr TESTED: Process specific, Two events, same addr TESTED: Process specific, Two events, diff addr, one is RO, other is WO TESTED: Process specific, Two events, same addr, one is RO, other is WO TESTED: Systemwide, Two events, diff addr TESTED: Systemwide, Two events, same addr TESTED: Systemwide, Two events, diff addr, one is RO, other is WO TESTED: Systemwide, Two events, same addr, one is RO, other is WO TESTED: Process specific, 512 bytes, unaligned success: perf_hwbreak Signed-off-by: Ravi Bangoria --- .../selftests/powerpc/ptrace/perf-hwbreak.c | 568 +- 1 file changed, 567 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c index bde475341c8a..5df08738884d 100644 --- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c @@ -21,8 +21,13 @@ #include #include #include +#include #include #include +#include +#include +#include +#include #include #include #include @@ -34,6 +39,12 @@ #define DAWR_LENGTH_MAX ((0x3f + 1) * 8) +int nprocs; + +static volatile int a = 10; +static volatile int b = 10; +static volatile char c[512 + 8] __attribute__((aligned(512))); + static void perf_event_attr_set(struct perf_event_attr *attr, __u32 type, __u64 addr, __u64 len, bool exclude_user) @@ -68,6 +79,76 @@ static int perf_process_event_open(__u32 type, __u64 addr, __u64 len) return syscall(__NR_perf_event_open, , getpid(), -1, -1, 0); } +static int perf_cpu_event_open(long cpu, __u32 type, __u64 addr, __u64 len) +{ + struct perf_event_attr attr; + + perf_event_attr_set(, type, addr, len, 0); + return syscall(__NR_perf_event_open, , -1, cpu, -1, 0); +} + +static void close_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + close(fd[i]); +} + +static unsigned long read_fds(int *fd, int n) +{ + int i; + unsigned long c = 0; + unsigned long count = 0; + size_t res; + + for (i = 0; i < n; i++) { + res = read(fd[i], , sizeof(c)); + assert(res == sizeof(unsigned long long)); + count += c; + } + return count; +} + +static void reset_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + ioctl(fd[i], PERF_EVENT_IOC_RESET); +} + +static void enable_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + ioctl(fd[i], PERF_EVENT_IOC_ENABLE); +} + +static void disable_fds(int *fd, int n) +{ + int i; + + for (i = 0; i < n; i++) + ioctl(fd[i], PERF_EVENT_IOC_DISABLE); +} + +static int perf_systemwide_event_open(int *fd, __u32 type, __u64 addr, __u64 len) +{ + int i = 0; + + /* Assume online processors are 0 to nprocs for simplisity */ + for (i = 0; i < nprocs; i++) { + fd[i] = perf_cpu_event_open(i, type, addr, len); + if (fd[i] < 0) { + close_fds(fd, i); + return fd[i]; + } + } + return 0; +} + static inline bool breakpoint_test(int len) { int fd; @@ -261,11 +342,483 @@ static int runtest_dar_outside(void) return fail; } +static void multi_dawr_workload(void) +{ + a += 10; + b += 10; + c[512 + 1] += 'a'; +} + +static int test_process_multi_diff_addr(void) +{ + unsigned long long breaks1 = 0, breaks2 = 0; + int fd1, fd2; + char *desc = "Process specific, Two events, diff addr"; + size_t res; + + fd1 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), (__u64)sizeof(a)); + if (fd1 < 0) { + perror("perf_process_event_open"); + exit(EXIT_FAILURE); + } + + fd2 = perf_process_event_open(HW_BREAKPOINT_RW, (__u64), (__u64)sizeof(b)); + if (fd2 < 0) { + close(fd1); + perror("perf_process_event_open"); + exit(EXIT_FAILURE); + } + + ioctl(fd1, PERF_EVENT_IOC_RESET); + ioctl(fd2, PERF_EVENT_IOC_RESET); + ioctl(fd1, PERF_EVENT_IOC_ENABLE); + ioctl(fd2, PERF_EVENT_IOC_ENABLE); + multi_dawr_workload(); + ioctl(fd1, PERF_EVENT_IOC_DISABLE); + ioctl(fd2, PERF_EVENT_IOC_DISABLE); + + res = read(fd1, , sizeof(breaks1)); + assert(res == sizeof(unsigned long long)); + res = read(fd2, , sizeof(breaks2)); + assert(res == sizeof(unsigned long long)); + + close(fd1); + close(fd2); + + if (breaks1 != 2 || breaks2 != 2
[PATCH 3/7] powerpc/watchpoint/kvm: Introduce new capability for 2nd DAWR
Introduce KVM_CAP_PPC_DAWR1 which can be used by Qemu to query whether kvm supports 2nd DAWR or not. Signed-off-by: Ravi Bangoria --- arch/powerpc/kvm/powerpc.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index dd7d141e33e8..f38380fd1fe9 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -676,6 +676,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) !kvmppc_hv_ops->enable_svm(NULL); break; #endif + case KVM_CAP_PPC_DAWR1: + r = cpu_has_feature(CPU_FTR_DAWR1); + break; default: r = 0; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4fdf30316582..2c3713d6526a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_PPC_SECURE_GUEST 181 #define KVM_CAP_HALT_POLL 182 #define KVM_CAP_ASYNC_PF_INT 183 +#define KVM_CAP_PPC_DAWR1 184 #ifdef KVM_CAP_IRQ_ROUTING -- 2.26.2
[PATCH 1/7] powerpc/watchpoint/kvm: Rename current DAWR macros and variables
Power10 is introducing second DAWR. Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. Signed-off-by: Ravi Bangoria --- Documentation/virt/kvm/api.rst| 4 +-- arch/powerpc/include/asm/kvm_host.h | 4 +-- arch/powerpc/include/uapi/asm/kvm.h | 4 +-- arch/powerpc/kernel/asm-offsets.c | 4 +-- arch/powerpc/kvm/book3s_hv.c | 32 +++ arch/powerpc/kvm/book3s_hv_nested.c | 8 +++--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 +++--- tools/arch/powerpc/include/uapi/asm/kvm.h | 4 +-- 8 files changed, 40 insertions(+), 40 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 426f94582b7a..4dc18fe6a2bf 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2219,8 +2219,8 @@ registers, find a list below: PPC KVM_REG_PPC_BESCR 64 PPC KVM_REG_PPC_TAR 64 PPC KVM_REG_PPC_DPDES 64 - PPC KVM_REG_PPC_DAWR64 - PPC KVM_REG_PPC_DAWRX 64 + PPC KVM_REG_PPC_DAWR0 64 + PPC KVM_REG_PPC_DAWRX0 64 PPC KVM_REG_PPC_CIABR 64 PPC KVM_REG_PPC_IC 64 PPC KVM_REG_PPC_VTB 64 diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 7e2d061d0445..9aa3854f0e1e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -582,8 +582,8 @@ struct kvm_vcpu_arch { u32 ctrl; u32 dabrx; ulong dabr; - ulong dawr; - ulong dawrx; + ulong dawr0; + ulong dawrx0; ulong ciabr; ulong cfar; ulong ppr; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 264e266a85bf..38d61b73f5ed 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -608,8 +608,8 @@ struct kvm_ppc_cpu_char { #define KVM_REG_PPC_BESCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa7) #define KVM_REG_PPC_TAR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa8) #define KVM_REG_PPC_DPDES (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa9) -#define KVM_REG_PPC_DAWR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xaa) -#define KVM_REG_PPC_DAWRX (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xab) +#define KVM_REG_PPC_DAWR0 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xaa) +#define KVM_REG_PPC_DAWRX0 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xab) #define KVM_REG_PPC_CIABR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xac) #define KVM_REG_PPC_IC (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xad) #define KVM_REG_PPC_VTB(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xae) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 6657dc6b2336..e76bffe348e1 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -547,8 +547,8 @@ int main(void) OFFSET(VCPU_CTRL, kvm_vcpu, arch.ctrl); OFFSET(VCPU_DABR, kvm_vcpu, arch.dabr); OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx); - OFFSET(VCPU_DAWR, kvm_vcpu, arch.dawr); - OFFSET(VCPU_DAWRX, kvm_vcpu, arch.dawrx); + OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0); + OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0); OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr); OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags); OFFSET(VCPU_DEC, kvm_vcpu, arch.dec); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 89afcc5f60ca..28200e4f5d27 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -778,8 +778,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, return H_UNSUPPORTED_FLAG_START; if (value2 & DABRX_HYP) return H_P4; - vcpu->arch.dawr = value1; - vcpu->arch.dawrx = value2; + vcpu->arch.dawr0 = value1; + vcpu->arch.dawrx0 = value2; return H_SUCCESS; case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE: /* KVM does not support mflags=2 (AIL=2) */ @@ -1724,11 +1724,11 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_VTB: *val = get_reg_val(id, vcpu->arch.vcore->vtb); break; - case KVM_REG_PPC_DAWR: - *val = get_reg_val(id, vcpu->arch.dawr); + case KVM_REG_PPC_DAWR0: + *val = get_reg_val(id, vcpu->arch.dawr0); break; - case KVM_REG_PPC_DAWRX: - *val = get_reg_val(id, vcpu->arch.dawrx); + case KVM_REG_PPC_DAWRX0: + *val = get_reg_val(id, vcpu->arch.dawrx0); break; case KVM_REG_PPC_CIABR: *val = get_reg_val(
[PATCH v2] powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_DAWR_ARCH_31
PPC_DEBUG_FEATURE_DATA_BP_DAWR_ARCH_31 can be used to determine whether we are running on an ISA 3.1 compliant machine. Which is needed to determine DAR behaviour, 512 byte boundary limit etc. This was requested by Pedro Miraglia Franco de Carvalho for extending watchpoint features in gdb. Note that availability of 2nd DAWR is independent of this flag and should be checked using ppc_debug_info->num_data_bps. Signed-off-by: Ravi Bangoria --- v1->v2: - Mention new flag in Documentaion/ as well. Documentation/powerpc/ptrace.rst | 1 + arch/powerpc/include/uapi/asm/ptrace.h| 1 + arch/powerpc/kernel/ptrace/ptrace-noadv.c | 5 - 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/Documentation/powerpc/ptrace.rst b/Documentation/powerpc/ptrace.rst index 864d4b61..4d42290248cb 100644 --- a/Documentation/powerpc/ptrace.rst +++ b/Documentation/powerpc/ptrace.rst @@ -46,6 +46,7 @@ features will have bits indicating whether there is support for:: #define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10 + #define PPC_DEBUG_FEATURE_DATA_BP_DAWR_ARCH_31 0x20 2. PTRACE_SETHWDEBUG diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h index f5f1ccc740fc..0a87bcd4300a 100644 --- a/arch/powerpc/include/uapi/asm/ptrace.h +++ b/arch/powerpc/include/uapi/asm/ptrace.h @@ -222,6 +222,7 @@ struct ppc_debug_info { #define PPC_DEBUG_FEATURE_DATA_BP_RANGE0x0004 #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x0008 #define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x0010 +#define PPC_DEBUG_FEATURE_DATA_BP_DAWR_ARCH_31 0x0020 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/ptrace/ptrace-noadv.c b/arch/powerpc/kernel/ptrace/ptrace-noadv.c index 697c7e4b5877..b2de874d650b 100644 --- a/arch/powerpc/kernel/ptrace/ptrace-noadv.c +++ b/arch/powerpc/kernel/ptrace/ptrace-noadv.c @@ -52,8 +52,11 @@ void ppc_gethwdinfo(struct ppc_debug_info *dbginfo) dbginfo->sizeof_condition = 0; if (IS_ENABLED(CONFIG_HAVE_HW_BREAKPOINT)) { dbginfo->features = PPC_DEBUG_FEATURE_DATA_BP_RANGE; - if (dawr_enabled()) + if (dawr_enabled()) { dbginfo->features |= PPC_DEBUG_FEATURE_DATA_BP_DAWR; + if (cpu_has_feature(CPU_FTR_ARCH_31)) + dbginfo->features |= PPC_DEBUG_FEATURE_DATA_BP_DAWR_ARCH_31; + } } else { dbginfo->features = 0; } -- 2.26.2
[PATCH v5 10/10] powerpc/watchpoint: Remove 512 byte boundary
Power10 has removed 512 bytes boundary from match criteria i.e. the watch range can cross 512 bytes boundary. Note: ISA 3.1 Book III 9.4 match criteria includes 512 byte limit but that is a documentation mistake and hopefully will be fixed in the next version of ISA. Though, ISA 3.1 change log mentions about removal of 512B boundary: Multiple DEAW: Added a second Data Address Watchpoint. [H]DAR is set to the first byte of overlap. 512B boundary is removed. Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index c55e67bab271..1f4a1efa0074 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -418,8 +418,9 @@ static int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw) if (dawr_enabled()) { max_len = DAWR_MAX_LEN; - /* DAWR region can't cross 512 bytes boundary */ - if (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512)) + /* DAWR region can't cross 512 bytes boundary on p10 predecessors */ + if (!cpu_has_feature(CPU_FTR_ARCH_31) && + (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512))) return -EINVAL; } else if (IS_ENABLED(CONFIG_PPC_8xx)) { /* 8xx can setup a range without limitation */ -- 2.26.2
[PATCH v5 09/10] powerpc/watchpoint: Return available watchpoints dynamically
So far Book3S Powerpc supported only one watchpoint. Power10 is introducing 2nd DAWR. Enable 2nd DAWR support for Power10. Availability of 2nd DAWR will depend on CPU_FTR_DAWR1. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 5 +++-- arch/powerpc/include/asm/hw_breakpoint.h | 4 +++- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 5583f2d08df7..fa1232c33ab9 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -629,9 +629,10 @@ enum { /* * Maximum number of hw breakpoint supported on powerpc. Number of - * breakpoints supported by actual hw might be less than this. + * breakpoints supported by actual hw might be less than this, which + * is decided at run time in nr_wp_slots(). */ -#define HBP_NUM_MAX1 +#define HBP_NUM_MAX2 #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index cb424799da0d..c89250b6ac34 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -9,6 +9,8 @@ #ifndef _PPC_BOOK3S_64_HW_BREAKPOINT_H #define _PPC_BOOK3S_64_HW_BREAKPOINT_H +#include + #ifdef __KERNEL__ struct arch_hw_breakpoint { unsigned long address; @@ -46,7 +48,7 @@ struct arch_hw_breakpoint { static inline int nr_wp_slots(void) { - return HBP_NUM_MAX; + return cpu_has_feature(CPU_FTR_DAWR1) ? 2 : 1; } #ifdef CONFIG_HAVE_HW_BREAKPOINT -- 2.26.2
[PATCH v5 06/10] powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bit
As per the PAPR, bit 0 of byte 64 in pa-features property indicates availability of 2nd DAWR registers. i.e. If this bit is set, 2nd DAWR is present, otherwise not. Host generally uses "cpu-features", which masks "pa-features". But "cpu-features" are still not used for guests and thus this change is mostly applicable for guests only. Signed-off-by: Ravi Bangoria Tested-by: Jordan Niethe --- arch/powerpc/kernel/prom.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 033d43819ed8..01dda206d68e 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -175,6 +175,8 @@ static struct ibm_pa_feature { */ { .pabyte = 22, .pabit = 0, .cpu_features = CPU_FTR_TM_COMP, .cpu_user_ftrs2 = PPC_FEATURE2_HTM_COMP | PPC_FEATURE2_HTM_NOSC_COMP }, + + { .pabyte = 64, .pabit = 0, .cpu_features = CPU_FTR_DAWR1 }, }; static void __init scan_features(unsigned long node, const unsigned char *ftrs, -- 2.26.2
[PATCH v5 03/10] powerpc/watchpoint: Fix DAWR exception for CACHEOP
'ea' returned by analyse_instr() needs to be aligned down to cache block size for CACHEOP instructions. analyse_instr() does not set size for CACHEOP, thus size also needs to be calculated manually. Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index a971e22aea81..c55e67bab271 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -538,7 +538,12 @@ static bool check_dawrx_constraints(struct pt_regs *regs, int type, if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ)) return false; - if (OP_IS_STORE(type) && !(info->type & HW_BRK_TYPE_WRITE)) + /* +* The Cache Management instructions other than dcbz never +* cause a match. i.e. if type is CACHEOP, the instruction +* is dcbz, and dcbz is treated as Store. +*/ + if ((OP_IS_STORE(type) || type == CACHEOP) && !(info->type & HW_BRK_TYPE_WRITE)) return false; if (is_kernel_addr(regs->nip) && !(info->type & HW_BRK_TYPE_KERNEL)) @@ -601,6 +606,15 @@ static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr, return false; } +static int cache_op_size(void) +{ +#ifdef __powerpc64__ + return ppc64_caches.l1d.block_size; +#else + return L1_CACHE_BYTES; +#endif +} + static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, int *type, int *size, unsigned long *ea) { @@ -616,7 +630,12 @@ static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr, if (!(regs->msr & MSR_64BIT)) *ea &= 0xUL; #endif + *size = GETSIZE(op.type); + if (*type == CACHEOP) { + *size = cache_op_size(); + *ea &= ~(*size - 1); + } } static bool is_larx_stcx_instr(int type) -- 2.26.2
[PATCH v5 08/10] powerpc/watchpoint: Guest support for 2nd DAWR hcall
2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5. Enable powervm guest support with that. This has no effect on kvm guest because kvm will return error if guest does hcall with resource value 5. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hvcall.h | 1 + arch/powerpc/include/asm/machdep.h| 2 +- arch/powerpc/include/asm/plpar_wrappers.h | 5 + arch/powerpc/kernel/dawr.c| 2 +- arch/powerpc/platforms/pseries/setup.c| 7 +-- 5 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index b785e9f0071c..33793444144c 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -358,6 +358,7 @@ #define H_SET_MODE_RESOURCE_SET_DAWR0 2 #define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE3 #define H_SET_MODE_RESOURCE_LE 4 +#define H_SET_MODE_RESOURCE_SET_DAWR1 5 /* Values for argument to H_SIGNAL_SYS_RESET */ #define H_SIGNAL_SYS_RESET_ALL -1 diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 7bcb6a39..a90b892f0bfe 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -131,7 +131,7 @@ struct machdep_calls { unsigned long dabrx); /* Set DAWR for this platform, leave empty for default implementation */ - int (*set_dawr)(unsigned long dawr, + int (*set_dawr)(int nr, unsigned long dawr, unsigned long dawrx); #ifdef CONFIG_PPC32/* XXX for now */ diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h index d12c3680d946..ece84a430701 100644 --- a/arch/powerpc/include/asm/plpar_wrappers.h +++ b/arch/powerpc/include/asm/plpar_wrappers.h @@ -315,6 +315,11 @@ static inline long plpar_set_watchpoint0(unsigned long dawr0, unsigned long dawr return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR0, dawr0, dawrx0); } +static inline long plpar_set_watchpoint1(unsigned long dawr1, unsigned long dawrx1) +{ + return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR1, dawr1, dawrx1); +} + static inline long plpar_signal_sys_reset(long cpu) { return plpar_hcall_norets(H_SIGNAL_SYS_RESET, cpu); diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c index 500f52fa4711..cdc2dccb987d 100644 --- a/arch/powerpc/kernel/dawr.c +++ b/arch/powerpc/kernel/dawr.c @@ -37,7 +37,7 @@ int set_dawr(int nr, struct arch_hw_breakpoint *brk) dawrx |= (mrd & 0x3f) << (63 - 53); if (ppc_md.set_dawr) - return ppc_md.set_dawr(dawr, dawrx); + return ppc_md.set_dawr(nr, dawr, dawrx); if (nr == 0) { mtspr(SPRN_DAWR0, dawr); diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 2db8469e475f..d516ee8eb7fc 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -831,12 +831,15 @@ static int pseries_set_xdabr(unsigned long dabr, unsigned long dabrx) return plpar_hcall_norets(H_SET_XDABR, dabr, dabrx); } -static int pseries_set_dawr(unsigned long dawr, unsigned long dawrx) +static int pseries_set_dawr(int nr, unsigned long dawr, unsigned long dawrx) { /* PAPR says we can't set HYP */ dawrx &= ~DAWRX_HYP; - return plpar_set_watchpoint0(dawr, dawrx); + if (nr == 0) + return plpar_set_watchpoint0(dawr, dawrx); + else + return plpar_set_watchpoint1(dawr, dawrx); } #define CMO_CHARACTERISTICS_TOKEN 44 -- 2.26.2
[PATCH v5 07/10] powerpc/watchpoint: Rename current H_SET_MODE DAWR macro
Current H_SET_MODE hcall macro name for setting/resetting DAWR0 is H_SET_MODE_RESOURCE_SET_DAWR. Add suffix 0 to macro name as well. Signed-off-by: Ravi Bangoria Reviewed-by: Jordan Niethe --- arch/powerpc/include/asm/hvcall.h | 2 +- arch/powerpc/include/asm/plpar_wrappers.h | 2 +- arch/powerpc/kvm/book3s_hv.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 43486e773bd6..b785e9f0071c 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -355,7 +355,7 @@ /* Values for 2nd argument to H_SET_MODE */ #define H_SET_MODE_RESOURCE_SET_CIABR 1 -#define H_SET_MODE_RESOURCE_SET_DAWR 2 +#define H_SET_MODE_RESOURCE_SET_DAWR0 2 #define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE3 #define H_SET_MODE_RESOURCE_LE 4 diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h index 4293c5d2ddf4..d12c3680d946 100644 --- a/arch/powerpc/include/asm/plpar_wrappers.h +++ b/arch/powerpc/include/asm/plpar_wrappers.h @@ -312,7 +312,7 @@ static inline long plpar_set_ciabr(unsigned long ciabr) static inline long plpar_set_watchpoint0(unsigned long dawr0, unsigned long dawrx0) { - return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR, dawr0, dawrx0); + return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR0, dawr0, dawrx0); } static inline long plpar_signal_sys_reset(long cpu) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6bf66649ab92..7ad692c2d7c7 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -764,7 +764,7 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, return H_P3; vcpu->arch.ciabr = value1; return H_SUCCESS; - case H_SET_MODE_RESOURCE_SET_DAWR: + case H_SET_MODE_RESOURCE_SET_DAWR0: if (!kvmppc_power8_compatible(vcpu)) return H_P2; if (!ppc_breakpoint_available()) -- 2.26.2
[PATCH v5 05/10] powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR
Add new device-tree feature for 2nd DAWR. If this feature is present, 2nd DAWR is supported, otherwise not. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 3 ++- arch/powerpc/kernel/dt_cpu_ftrs.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index e506d429b1af..5583f2d08df7 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -214,6 +214,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_TLBIE_ERAT_BUG LONG_ASM_CONST(0x0001) #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) +#define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) #ifndef __ASSEMBLY__ @@ -478,7 +479,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \ - CPU_FTR_ARCH_31 | CPU_FTR_DAWR) + CPU_FTR_ARCH_31 | CPU_FTR_DAWR | CPU_FTR_DAWR1) #define CPU_FTRS_CELL (CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index ac650c233cd9..675b824038f9 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -649,6 +649,7 @@ static struct dt_cpu_feature_match __initdata {"wait-v3", feat_enable, 0}, {"prefix-instructions", feat_enable, 0}, {"matrix-multiply-assist", feat_enable_mma, 0}, + {"debug-facilities-v31", feat_enable, CPU_FTR_DAWR1}, }; static bool __initdata using_dt_cpu_ftrs; -- 2.26.2
[PATCH v5 04/10] powerpc/watchpoint: Enable watchpoint functionality on power10 guest
CPU_FTR_DAWR is by default enabled for host via CPU_FTRS_DT_CPU_BASE (controlled by CONFIG_PPC_DT_CPU_FTRS). But cpu-features device-tree node is not PAPR compatible and thus not yet used by kvm or pHyp guests. Enable watchpoint functionality on power10 guest (both kvm and powervm) by adding CPU_FTR_DAWR to CPU_FTRS_POWER10. Note that this change does not enable 2nd DAWR support. Signed-off-by: Ravi Bangoria Tested-by: Jordan Niethe --- arch/powerpc/include/asm/cputable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index bac2252c839e..e506d429b1af 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -478,7 +478,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \ - CPU_FTR_ARCH_31) + CPU_FTR_ARCH_31 | CPU_FTR_DAWR) #define CPU_FTRS_CELL (CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ -- 2.26.2
[PATCH v5 02/10] powerpc/watchpoint: Fix DAWR exception constraint
Pedro Miraglia Franco de Carvalho noticed that on p8/p9, DAR value is inconsistent with different type of load/store. Like for byte,word etc. load/stores, DAR is set to the address of the first byte of overlap between watch range and real access. But for quadword load/ store it's sometime set to the address of the first byte of real access whereas sometime set to the address of the first byte of overlap. This issue has been fixed in p10. In p10(ISA 3.1), DAR is always set to the address of the first byte of overlap. Commit 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") wrongly assumes that DAR is set to the address of the first byte of overlap for all load/stores on p8/p9 as well. Fix that. With the fix, we now rely on 'ea' provided by analyse_instr(). If analyse_instr() fails, generate event unconditionally on p8/p9, and on p10 generate event only if DAR is within a DAWR range. Note: 8xx is not affected. Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Reported-by: Pedro Miraglia Franco de Carvalho Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 72 - 1 file changed, 41 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 031e6defc08e..a971e22aea81 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -498,11 +498,11 @@ static bool dar_in_user_range(unsigned long dar, struct arch_hw_breakpoint *info return ((info->address <= dar) && (dar - info->address < info->len)); } -static bool dar_user_range_overlaps(unsigned long dar, int size, - struct arch_hw_breakpoint *info) +static bool ea_user_range_overlaps(unsigned long ea, int size, + struct arch_hw_breakpoint *info) { - return ((dar < info->address + info->len) && - (dar + size > info->address)); + return ((ea < info->address + info->len) && + (ea + size > info->address)); } static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info) @@ -515,20 +515,22 @@ static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info) return ((hw_start_addr <= dar) && (hw_end_addr > dar)); } -static bool dar_hw_range_overlaps(unsigned long dar, int size, - struct arch_hw_breakpoint *info) +static bool ea_hw_range_overlaps(unsigned long ea, int size, +struct arch_hw_breakpoint *info) { unsigned long hw_start_addr, hw_end_addr; hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE); hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE); - return ((dar < hw_end_addr) && (dar + size > hw_start_addr)); + return ((ea < hw_end_addr) && (ea + size > hw_start_addr)); } /* * If hw has multiple DAWR registers, we also need to check all * dawrx constraint bits to confirm this is _really_ a valid event. + * If type is UNKNOWN, but privilege level matches, consider it as + * a positive match. */ static bool check_dawrx_constraints(struct pt_regs *regs, int type, struct arch_hw_breakpoint *info) @@ -553,7 +555,8 @@ static bool check_dawrx_constraints(struct pt_regs *regs, int type, * including extraneous exception. Otherwise return false. */ static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr, - int type, int size, struct arch_hw_breakpoint *info) + unsigned long ea, int type, int size, + struct arch_hw_breakpoint *info) { bool in_user_range = dar_in_user_range(regs->dar, info); bool dawrx_constraints; @@ -569,22 +572,27 @@ static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr, } if (unlikely(ppc_inst_equal(instr, ppc_inst(0 { - if (in_user_range) - return true; + if (cpu_has_feature(CPU_FTR_ARCH_31) && + !dar_in_hw_range(regs->dar, info)) + return false; - if (dar_in_hw_range(regs->dar, info)) { - info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; - return true; - } - return false; + return true; } dawrx_constraints = check_dawrx_constraints(regs, type, info); - if (dar_user_range_overlaps(regs->dar, size, info)) + if (type == UNKNOWN) { + if
[PATCH v5 00/10] powerpc/watchpoint: Enable 2nd DAWR on baremetal and powervm
Last series[1] was to add basic infrastructure support for more than one watchpoint on Book3S powerpc. This series actually enables the 2nd DAWR for baremetal and powervm. Kvm guest is still not supported. v4: https://lore.kernel.org/r/20200717040958.70561-1-ravi.bango...@linux.ibm.com v4->v5: - Using hardcoded values instead of macros HBP_NUM_ONE and HBP_NUM_TWO. Comment above HBP_NUM_MAX changed to explain it's value. - Included CPU_FTR_DAWR1 into CPU_FTRS_POWER10 - Using generic function feat_enable() instead of feat_enable_debug_facilities_v31() to enable CPU_FTR_DAWR1. - ISA still includes 512B boundary in match criteria. But that's a documentation mistake. Mentioned about this in the last patch. - Rebased to powerpc/next - Added Jordan's Reviewed-by/Tested-by tags [1]: https://lore.kernel.org/linuxppc-dev/20200514111741.97993-1-ravi.bango...@linux.ibm.com/ Ravi Bangoria (10): powerpc/watchpoint: Fix 512 byte boundary limit powerpc/watchpoint: Fix DAWR exception constraint powerpc/watchpoint: Fix DAWR exception for CACHEOP powerpc/watchpoint: Enable watchpoint functionality on power10 guest powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bit powerpc/watchpoint: Rename current H_SET_MODE DAWR macro powerpc/watchpoint: Guest support for 2nd DAWR hcall powerpc/watchpoint: Return available watchpoints dynamically powerpc/watchpoint: Remove 512 byte boundary arch/powerpc/include/asm/cputable.h | 8 +- arch/powerpc/include/asm/hvcall.h | 3 +- arch/powerpc/include/asm/hw_breakpoint.h | 4 +- arch/powerpc/include/asm/machdep.h| 2 +- arch/powerpc/include/asm/plpar_wrappers.h | 7 +- arch/powerpc/kernel/dawr.c| 2 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 1 + arch/powerpc/kernel/hw_breakpoint.c | 98 +++ arch/powerpc/kernel/prom.c| 2 + arch/powerpc/kvm/book3s_hv.c | 2 +- arch/powerpc/platforms/pseries/setup.c| 7 +- 11 files changed, 91 insertions(+), 45 deletions(-) -- 2.26.2
[PATCH v5 01/10] powerpc/watchpoint: Fix 512 byte boundary limit
Milton Miller reported that we are aligning start and end address to wrong size SZ_512M. It should be SZ_512. Fix that. While doing this change I also found a case where ALIGN() comparison fails. Within a given aligned range, ALIGN() of two addresses does not match when start address is pointing to the first byte and end address is pointing to any other byte except the first one. But that's not true for ALIGN_DOWN(). ALIGN_DOWN() of any two addresses within that range will always point to the first byte. So use ALIGN_DOWN() instead of ALIGN(). Fixes: e68ef121c1f4 ("powerpc/watchpoint: Use builtin ALIGN*() macros") Reported-by: Milton Miller Signed-off-by: Ravi Bangoria Tested-by: Jordan Niethe --- arch/powerpc/kernel/hw_breakpoint.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index daf0e1da..031e6defc08e 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -419,7 +419,7 @@ static int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw) if (dawr_enabled()) { max_len = DAWR_MAX_LEN; /* DAWR region can't cross 512 bytes boundary */ - if (ALIGN(start_addr, SZ_512M) != ALIGN(end_addr - 1, SZ_512M)) + if (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512)) return -EINVAL; } else if (IS_ENABLED(CONFIG_PPC_8xx)) { /* 8xx can setup a range without limitation */ -- 2.26.2
Re: [PATCH v4 05/10] powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR
On 7/21/20 7:37 PM, Michael Ellerman wrote: Ravi Bangoria writes: On 7/21/20 4:59 PM, Michael Ellerman wrote: Ravi Bangoria writes: On 7/17/20 11:14 AM, Jordan Niethe wrote: On Fri, Jul 17, 2020 at 2:10 PM Ravi Bangoria wrote: Add new device-tree feature for 2nd DAWR. If this feature is present, 2nd DAWR is supported, otherwise not. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 7 +-- arch/powerpc/kernel/dt_cpu_ftrs.c | 7 +++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index e506d429b1af..3445c86e1f6f 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -214,6 +214,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_TLBIE_ERAT_BUG LONG_ASM_CONST(0x0001) #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) +#define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) #ifndef __ASSEMBLY__ @@ -497,14 +498,16 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTR_DAWR1) #else #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \ CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTR_DAWR1) Instead of putting CPU_FTR_DAWR1 into CPU_FTRS_POSSIBLE should it go into CPU_FTRS_POWER10? Then it will be picked up by CPU_FTRS_POSSIBLE. I remember a discussion about this with Mikey and we decided to do it this way. Obviously, the purpose is to make CPU_FTR_DAWR1 independent of CPU_FTRS_POWER10 because DAWR1 is an optional feature in p10. I fear including CPU_FTR_DAWR1 in CPU_FTRS_POWER10 can make it forcefully enabled even when device-tree property is not present or pa-feature bit it not set, because we do: { /* 3.1-compliant processor, i.e. Power10 "architected" mode */ .pvr_mask = 0x, .pvr_value = 0x0f06, .cpu_name = "POWER10 (architected)", .cpu_features = CPU_FTRS_POWER10, The pa-features logic will turn it off if the feature bit is not set. So you should be able to put it in CPU_FTRS_POWER10. See for example CPU_FTR_NOEXECUTE. Ah ok. scan_features() clears the feature if the bit is not set in pa-features. So it should work find for powervm. I'll verify the same thing happens in case of baremetal where we use cpu-features not pa-features. If it works in baremetal as well, will put it in CPU_FTRS_POWER10. When we use DT CPU features we don't use CPU_FTRS_POWER10 at all. We construct a cpu_spec from scratch with just the base set of features: static struct cpu_spec __initdata base_cpu_spec = { .cpu_name = NULL, .cpu_features = CPU_FTRS_DT_CPU_BASE, And then individual features are enabled via the device tree flags. Ah good. I was under a wrong impression that we use cpu_specs[] for all the cases. Thanks mpe for explaining in detail :) Ravi
Re: [PATCH v4 05/10] powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR
On 7/21/20 4:59 PM, Michael Ellerman wrote: Ravi Bangoria writes: On 7/17/20 11:14 AM, Jordan Niethe wrote: On Fri, Jul 17, 2020 at 2:10 PM Ravi Bangoria wrote: Add new device-tree feature for 2nd DAWR. If this feature is present, 2nd DAWR is supported, otherwise not. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 7 +-- arch/powerpc/kernel/dt_cpu_ftrs.c | 7 +++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index e506d429b1af..3445c86e1f6f 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -214,6 +214,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_TLBIE_ERAT_BUG LONG_ASM_CONST(0x0001) #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) +#define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) #ifndef __ASSEMBLY__ @@ -497,14 +498,16 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTR_DAWR1) #else #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \ CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTR_DAWR1) Instead of putting CPU_FTR_DAWR1 into CPU_FTRS_POSSIBLE should it go into CPU_FTRS_POWER10? Then it will be picked up by CPU_FTRS_POSSIBLE. I remember a discussion about this with Mikey and we decided to do it this way. Obviously, the purpose is to make CPU_FTR_DAWR1 independent of CPU_FTRS_POWER10 because DAWR1 is an optional feature in p10. I fear including CPU_FTR_DAWR1 in CPU_FTRS_POWER10 can make it forcefully enabled even when device-tree property is not present or pa-feature bit it not set, because we do: { /* 3.1-compliant processor, i.e. Power10 "architected" mode */ .pvr_mask = 0x, .pvr_value = 0x0f06, .cpu_name = "POWER10 (architected)", .cpu_features = CPU_FTRS_POWER10, The pa-features logic will turn it off if the feature bit is not set. So you should be able to put it in CPU_FTRS_POWER10. See for example CPU_FTR_NOEXECUTE. Ah ok. scan_features() clears the feature if the bit is not set in pa-features. So it should work find for powervm. I'll verify the same thing happens in case of baremetal where we use cpu-features not pa-features. If it works in baremetal as well, will put it in CPU_FTRS_POWER10. Thanks for the clarification, Ravi
Re: [PATCH v4 09/10] powerpc/watchpoint: Return available watchpoints dynamically
On 7/21/20 5:06 PM, Michael Ellerman wrote: Ravi Bangoria writes: On 7/20/20 9:12 AM, Jordan Niethe wrote: On Fri, Jul 17, 2020 at 2:11 PM Ravi Bangoria wrote: So far Book3S Powerpc supported only one watchpoint. Power10 is introducing 2nd DAWR. Enable 2nd DAWR support for Power10. Availability of 2nd DAWR will depend on CPU_FTR_DAWR1. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 4 +++- arch/powerpc/include/asm/hw_breakpoint.h | 5 +++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 3445c86e1f6f..36a0851a7a9b 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -633,7 +633,9 @@ enum { * Maximum number of hw breakpoint supported on powerpc. Number of * breakpoints supported by actual hw might be less than this. */ -#define HBP_NUM_MAX1 +#define HBP_NUM_MAX2 +#define HBP_NUM_ONE1 +#define HBP_NUM_TWO2 I wonder if these defines are necessary - has it any advantage over just using the literal? No, not really. Initially I had something like: #define HBP_NUM_MAX2 #define HBP_NUM_P8_P9 1 #define HBP_NUM_P102 But then I thought it's also not right. So I made it _ONE and _TWO. Now the function that decides nr watchpoints dynamically (nr_wp_slots) is in different file, I thought to keep it like this so it would be easier to figure out why _MAX is 2. I don't think it makes anything clearer. I had to stare at it thinking there was some sort of mapping or indirection going on, before I realised it's just literally the number of breakpoints. So please just do: static inline int nr_wp_slots(void) { return cpu_has_feature(CPU_FTR_DAWR1) ? 2 : 1; } If you think HBP_NUM_MAX needs explanation then do that with a comment, it can refer to nr_wp_slots() if that's helpful. Agreed. By adding a comment, we can remove those macros. Will change it. Thanks, Ravi
Re: [PATCH v4 09/10] powerpc/watchpoint: Return available watchpoints dynamically
@@ -46,7 +47,7 @@ struct arch_hw_breakpoint { static inline int nr_wp_slots(void) { - return HBP_NUM_MAX; + return cpu_has_feature(CPU_FTR_DAWR1) ? HBP_NUM_TWO : HBP_NUM_ONE; So it'd be something like: + return cpu_has_feature(CPU_FTR_DAWR1) ? HBP_NUM_MAX : 1; But thinking that there might be more slots added in the future, it may be better to make the number of slots a variable that is set during the init and then have this function return that. Not sure I follow. What do you mean by setting number of slots a variable that is set during the init? Sorry I was unclear there. I was just looking and saw arm also has a variable number of hw breakpoints. If we did something like how they handle it, it might look something like: static int num_wp_slots __ro_after_init; int nr_wp_slots(void) { return num_wp_slots; } static int __init arch_hw_breakpoint_init(void) { num_wp_slots = work out how many wp_slots } arch_initcall(arch_hw_breakpoint_init); Then we wouldn't have to calculate everytime nr_wp_slots() is called. In the future if more wp's are added nr_wp_slots() will get more complicated. But just an idea, feel free to ignore. Ok I got the idea. But ARM arch_hw_breakpoint_init() is much more complex compared to our nr_wp_slots(). I don't see any benefit by making our code like ARM. Thanks for the idea though :) Ravi
Re: [PATCH v4 05/10] powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR
On 7/17/20 11:14 AM, Jordan Niethe wrote: On Fri, Jul 17, 2020 at 2:10 PM Ravi Bangoria wrote: Add new device-tree feature for 2nd DAWR. If this feature is present, 2nd DAWR is supported, otherwise not. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 7 +-- arch/powerpc/kernel/dt_cpu_ftrs.c | 7 +++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index e506d429b1af..3445c86e1f6f 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -214,6 +214,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_TLBIE_ERAT_BUG LONG_ASM_CONST(0x0001) #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) +#define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) #ifndef __ASSEMBLY__ @@ -497,14 +498,16 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTR_DAWR1) #else #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \ CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTR_DAWR1) Instead of putting CPU_FTR_DAWR1 into CPU_FTRS_POSSIBLE should it go into CPU_FTRS_POWER10? Then it will be picked up by CPU_FTRS_POSSIBLE. I remember a discussion about this with Mikey and we decided to do it this way. Obviously, the purpose is to make CPU_FTR_DAWR1 independent of CPU_FTRS_POWER10 because DAWR1 is an optional feature in p10. I fear including CPU_FTR_DAWR1 in CPU_FTRS_POWER10 can make it forcefully enabled even when device-tree property is not present or pa-feature bit it not set, because we do: { /* 3.1-compliant processor, i.e. Power10 "architected" mode */ .pvr_mask = 0x, .pvr_value = 0x0f06, .cpu_name = "POWER10 (architected)", .cpu_features = CPU_FTRS_POWER10, #endif /* CONFIG_CPU_LITTLE_ENDIAN */ #endif #else diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index ac650c233cd9..c78cd3596ec4 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -574,6 +574,12 @@ static int __init feat_enable_mma(struct dt_cpu_feature *f) return 1; } +static int __init feat_enable_debug_facilities_v31(struct dt_cpu_feature *f) +{ + cur_cpu_spec->cpu_features |= CPU_FTR_DAWR1; + return 1; +} + struct dt_cpu_feature_match { const char *name; int (*enable)(struct dt_cpu_feature *f); @@ -649,6 +655,7 @@ static struct dt_cpu_feature_match __initdata {"wait-v3", feat_enable, 0}, {"prefix-instructions", feat_enable, 0}, {"matrix-multiply-assist", feat_enable_mma, 0}, + {"debug-facilities-v31", feat_enable_debug_facilities_v31, 0}, Since all feat_enable_debug_facilities_v31() does is set CPU_FTR_DAWR1, if you just have: {"debug-facilities-v31", feat_enable, CPU_FTR_DAWR1}, I think cpufeatures_process_feature() should set it in for you at this point: if (m->enable(f)) { cur_cpu_spec->cpu_features |= m->cpu_ftr_bit_mask; break; } Yes, that seems a better option. Thanks, Ravi
Re: [PATCH v4 09/10] powerpc/watchpoint: Return available watchpoints dynamically
On 7/20/20 9:12 AM, Jordan Niethe wrote: On Fri, Jul 17, 2020 at 2:11 PM Ravi Bangoria wrote: So far Book3S Powerpc supported only one watchpoint. Power10 is introducing 2nd DAWR. Enable 2nd DAWR support for Power10. Availability of 2nd DAWR will depend on CPU_FTR_DAWR1. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 4 +++- arch/powerpc/include/asm/hw_breakpoint.h | 5 +++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 3445c86e1f6f..36a0851a7a9b 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -633,7 +633,9 @@ enum { * Maximum number of hw breakpoint supported on powerpc. Number of * breakpoints supported by actual hw might be less than this. */ -#define HBP_NUM_MAX1 +#define HBP_NUM_MAX2 +#define HBP_NUM_ONE1 +#define HBP_NUM_TWO2 I wonder if these defines are necessary - has it any advantage over just using the literal? No, not really. Initially I had something like: #define HBP_NUM_MAX2 #define HBP_NUM_P8_P9 1 #define HBP_NUM_P102 But then I thought it's also not right. So I made it _ONE and _TWO. Now the function that decides nr watchpoints dynamically (nr_wp_slots) is in different file, I thought to keep it like this so it would be easier to figure out why _MAX is 2. #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index cb424799da0d..d4eab1694bcd 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -5,10 +5,11 @@ * Copyright 2010, IBM Corporation. * Author: K.Prasad */ - Was removing this line deliberate? Nah. Will remove that hunk. #ifndef _PPC_BOOK3S_64_HW_BREAKPOINT_H #define _PPC_BOOK3S_64_HW_BREAKPOINT_H +#include + #ifdef __KERNEL__ struct arch_hw_breakpoint { unsigned long address; @@ -46,7 +47,7 @@ struct arch_hw_breakpoint { static inline int nr_wp_slots(void) { - return HBP_NUM_MAX; + return cpu_has_feature(CPU_FTR_DAWR1) ? HBP_NUM_TWO : HBP_NUM_ONE; So it'd be something like: + return cpu_has_feature(CPU_FTR_DAWR1) ? HBP_NUM_MAX : 1; But thinking that there might be more slots added in the future, it may be better to make the number of slots a variable that is set during the init and then have this function return that. Not sure I follow. What do you mean by setting number of slots a variable that is set during the init? Thanks, Ravi
Re: [PATCH v4 10/10] powerpc/watchpoint: Remove 512 byte boundary
Hi Jordan, On 7/20/20 12:24 PM, Jordan Niethe wrote: On Fri, Jul 17, 2020 at 2:11 PM Ravi Bangoria wrote: Power10 has removed 512 bytes boundary from match criteria. i.e. The watch range can cross 512 bytes boundary. It looks like this change is not mentioned in ISA v3.1 Book III 9.4 Data Address Watchpoint. It could be useful to mention that in the commit message. Yes, ISA 3.1 Book III 9.4 has a documentation mistake and hopefully it will be fixed in the next version of ISA. Though, this is mentioned in ISA 3.1 change log: Multiple DEAW: Added a second Data Address Watchpoint. [H]DAR is set to the first byte of overlap. 512B boundary is removed. I'll mention this in the commit description. Also I wonder if could add a test for this to the ptrace-hwbreak selftest? Yes, I already have a selftest for this in perf-hwbreak. Will send that soon. Thanks, Ravi
Re: [PATCH v2 2/3] powerpc/powernv/idle: save-restore DAWR0, DAWRX0 for P10
Hi Nick, On 7/13/20 11:22 AM, Nicholas Piggin wrote: Excerpts from Pratik Rajesh Sampat's message of July 10, 2020 3:22 pm: Additional registers DAWR0, DAWRX0 may be lost on Power 10 for stop levels < 4. Therefore save the values of these SPRs before entering a "stop" state and restore their values on wakeup. Hmm, where do you get this from? Documentation I see says DAWR is lost on POWER9 but not P10. Does idle thread even need to save DAWR, or does it get switched when going to a thread that has a watchpoint set? I don't know how idle states works internally but IIUC, we need to save/restore DAWRs. This is needed when user creates per-cpu watchpoint event. Ravi
Re: [PATCH v2 2/3] powerpc/powernv/idle: save-restore DAWR0,DAWRX0 for P10
Hi Pratik, On 7/10/20 10:52 AM, Pratik Rajesh Sampat wrote: Additional registers DAWR0, DAWRX0 may be lost on Power 10 for stop levels < 4. p10 has one more pair DAWR1/DAWRX1. Please include that as well. Ravi
[PATCH v4 08/10] powerpc/watchpoint: Guest support for 2nd DAWR hcall
2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5. Enable powervm guest support with that. This has no effect on kvm guest because kvm will return error if guest does hcall with resource value 5. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/hvcall.h | 1 + arch/powerpc/include/asm/machdep.h| 2 +- arch/powerpc/include/asm/plpar_wrappers.h | 5 + arch/powerpc/kernel/dawr.c| 2 +- arch/powerpc/platforms/pseries/setup.c| 7 +-- 5 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index b785e9f0071c..33793444144c 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -358,6 +358,7 @@ #define H_SET_MODE_RESOURCE_SET_DAWR0 2 #define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE3 #define H_SET_MODE_RESOURCE_LE 4 +#define H_SET_MODE_RESOURCE_SET_DAWR1 5 /* Values for argument to H_SIGNAL_SYS_RESET */ #define H_SIGNAL_SYS_RESET_ALL -1 diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 7bcb6a39..a90b892f0bfe 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -131,7 +131,7 @@ struct machdep_calls { unsigned long dabrx); /* Set DAWR for this platform, leave empty for default implementation */ - int (*set_dawr)(unsigned long dawr, + int (*set_dawr)(int nr, unsigned long dawr, unsigned long dawrx); #ifdef CONFIG_PPC32/* XXX for now */ diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h index d12c3680d946..ece84a430701 100644 --- a/arch/powerpc/include/asm/plpar_wrappers.h +++ b/arch/powerpc/include/asm/plpar_wrappers.h @@ -315,6 +315,11 @@ static inline long plpar_set_watchpoint0(unsigned long dawr0, unsigned long dawr return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR0, dawr0, dawrx0); } +static inline long plpar_set_watchpoint1(unsigned long dawr1, unsigned long dawrx1) +{ + return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR1, dawr1, dawrx1); +} + static inline long plpar_signal_sys_reset(long cpu) { return plpar_hcall_norets(H_SIGNAL_SYS_RESET, cpu); diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c index 500f52fa4711..cdc2dccb987d 100644 --- a/arch/powerpc/kernel/dawr.c +++ b/arch/powerpc/kernel/dawr.c @@ -37,7 +37,7 @@ int set_dawr(int nr, struct arch_hw_breakpoint *brk) dawrx |= (mrd & 0x3f) << (63 - 53); if (ppc_md.set_dawr) - return ppc_md.set_dawr(dawr, dawrx); + return ppc_md.set_dawr(nr, dawr, dawrx); if (nr == 0) { mtspr(SPRN_DAWR0, dawr); diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 2db8469e475f..d516ee8eb7fc 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -831,12 +831,15 @@ static int pseries_set_xdabr(unsigned long dabr, unsigned long dabrx) return plpar_hcall_norets(H_SET_XDABR, dabr, dabrx); } -static int pseries_set_dawr(unsigned long dawr, unsigned long dawrx) +static int pseries_set_dawr(int nr, unsigned long dawr, unsigned long dawrx) { /* PAPR says we can't set HYP */ dawrx &= ~DAWRX_HYP; - return plpar_set_watchpoint0(dawr, dawrx); + if (nr == 0) + return plpar_set_watchpoint0(dawr, dawrx); + else + return plpar_set_watchpoint1(dawr, dawrx); } #define CMO_CHARACTERISTICS_TOKEN 44 -- 2.26.2
[PATCH v4 10/10] powerpc/watchpoint: Remove 512 byte boundary
Power10 has removed 512 bytes boundary from match criteria. i.e. The watch range can cross 512 bytes boundary. Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/hw_breakpoint.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index c55e67bab271..1f4a1efa0074 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -418,8 +418,9 @@ static int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw) if (dawr_enabled()) { max_len = DAWR_MAX_LEN; - /* DAWR region can't cross 512 bytes boundary */ - if (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512)) + /* DAWR region can't cross 512 bytes boundary on p10 predecessors */ + if (!cpu_has_feature(CPU_FTR_ARCH_31) && + (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, SZ_512))) return -EINVAL; } else if (IS_ENABLED(CONFIG_PPC_8xx)) { /* 8xx can setup a range without limitation */ -- 2.26.2
[PATCH v4 09/10] powerpc/watchpoint: Return available watchpoints dynamically
So far Book3S Powerpc supported only one watchpoint. Power10 is introducing 2nd DAWR. Enable 2nd DAWR support for Power10. Availability of 2nd DAWR will depend on CPU_FTR_DAWR1. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 4 +++- arch/powerpc/include/asm/hw_breakpoint.h | 5 +++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 3445c86e1f6f..36a0851a7a9b 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -633,7 +633,9 @@ enum { * Maximum number of hw breakpoint supported on powerpc. Number of * breakpoints supported by actual hw might be less than this. */ -#define HBP_NUM_MAX1 +#define HBP_NUM_MAX2 +#define HBP_NUM_ONE1 +#define HBP_NUM_TWO2 #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index cb424799da0d..d4eab1694bcd 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -5,10 +5,11 @@ * Copyright 2010, IBM Corporation. * Author: K.Prasad */ - #ifndef _PPC_BOOK3S_64_HW_BREAKPOINT_H #define _PPC_BOOK3S_64_HW_BREAKPOINT_H +#include + #ifdef __KERNEL__ struct arch_hw_breakpoint { unsigned long address; @@ -46,7 +47,7 @@ struct arch_hw_breakpoint { static inline int nr_wp_slots(void) { - return HBP_NUM_MAX; + return cpu_has_feature(CPU_FTR_DAWR1) ? HBP_NUM_TWO : HBP_NUM_ONE; } #ifdef CONFIG_HAVE_HW_BREAKPOINT -- 2.26.2
[PATCH v4 06/10] powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bit
As per the PAPR, bit 0 of byte 64 in pa-features property indicates availability of 2nd DAWR registers. i.e. If this bit is set, 2nd DAWR is present, otherwise not. Host generally uses "cpu-features", which masks "pa-features". But "cpu-features" are still not used for guests and thus this change is mostly applicable for guests only. Signed-off-by: Ravi Bangoria --- arch/powerpc/kernel/prom.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 9cc49f265c86..c76c09b97bc8 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -175,6 +175,8 @@ static struct ibm_pa_feature { */ { .pabyte = 22, .pabit = 0, .cpu_features = CPU_FTR_TM_COMP, .cpu_user_ftrs2 = PPC_FEATURE2_HTM_COMP | PPC_FEATURE2_HTM_NOSC_COMP }, + + { .pabyte = 64, .pabit = 0, .cpu_features = CPU_FTR_DAWR1 }, }; static void __init scan_features(unsigned long node, const unsigned char *ftrs, -- 2.26.2
[PATCH v4 05/10] powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR
Add new device-tree feature for 2nd DAWR. If this feature is present, 2nd DAWR is supported, otherwise not. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 7 +-- arch/powerpc/kernel/dt_cpu_ftrs.c | 7 +++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index e506d429b1af..3445c86e1f6f 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -214,6 +214,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_TLBIE_ERAT_BUG LONG_ASM_CONST(0x0001) #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) +#define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) #ifndef __ASSEMBLY__ @@ -497,14 +498,16 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTR_DAWR1) #else #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \ CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | \ +CPU_FTR_DAWR1) #endif /* CONFIG_CPU_LITTLE_ENDIAN */ #endif #else diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index ac650c233cd9..c78cd3596ec4 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -574,6 +574,12 @@ static int __init feat_enable_mma(struct dt_cpu_feature *f) return 1; } +static int __init feat_enable_debug_facilities_v31(struct dt_cpu_feature *f) +{ + cur_cpu_spec->cpu_features |= CPU_FTR_DAWR1; + return 1; +} + struct dt_cpu_feature_match { const char *name; int (*enable)(struct dt_cpu_feature *f); @@ -649,6 +655,7 @@ static struct dt_cpu_feature_match __initdata {"wait-v3", feat_enable, 0}, {"prefix-instructions", feat_enable, 0}, {"matrix-multiply-assist", feat_enable_mma, 0}, + {"debug-facilities-v31", feat_enable_debug_facilities_v31, 0}, }; static bool __initdata using_dt_cpu_ftrs; -- 2.26.2
[PATCH v4 04/10] powerpc/watchpoint: Enable watchpoint functionality on power10 guest
CPU_FTR_DAWR is by default enabled for host via CPU_FTRS_DT_CPU_BASE (controlled by CONFIG_PPC_DT_CPU_FTRS). But cpu-features device-tree node is not PAPR compatible and thus not yet used by kvm or pHyp guests. Enable watchpoint functionality on power10 guest (both kvm and powervm) by adding CPU_FTR_DAWR to CPU_FTRS_POWER10. Note that this change does not enable 2nd DAWR support. Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/cputable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index bac2252c839e..e506d429b1af 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -478,7 +478,7 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \ - CPU_FTR_ARCH_31) + CPU_FTR_ARCH_31 | CPU_FTR_DAWR) #define CPU_FTRS_CELL (CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ -- 2.26.2