[RFC V2 0/3] mm/hotplug: Pre-validate the address range with platform
This series adds a mechanism allowing platforms to weigh in and prevalidate incoming address range before proceeding further with the memory hotplug. This helps prevent potential platform errors for the given address range, down the hotplug call chain, which inevitably fails the hotplug itself. This mechanism was suggested by David Hildenbrand during another discussion with respect to a memory hotplug fix on arm64 platform. https://lore.kernel.org/linux-arm-kernel/1600332402-30123-1-git-send-email-anshuman.khand...@arm.com/ This mechanism focuses on the addressibility aspect and not [sub] section alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span() have been left unchanged. Wondering if all these can still be unified in an expanded memhp_range_allowed() check, that can be called from multiple memory hot add and remove paths. This series applies on v5.10-rc6 and has been slightly tested on arm64. But looking for some early feedback here. Changes in RFC V2: Incorporated all review feedbacks from David. - Added additional range check in __segment_load() on s390 which was lost - Changed is_private init in pagemap_range() - Moved the framework into mm/memory_hotplug.c - Made arch_get_addressable_range() a __weak function - Renamed arch_get_addressable_range() as arch_get_mappable_range() - Callback arch_get_mappable_range() only handles range requiring linear mapping - Merged multiple memhp_range_allowed() checks in register_memory_resource() - Replaced WARN() with pr_warn() in memhp_range_allowed() - Replaced error return code ERANGE with E2BIG Changes in RFC V1: https://lore.kernel.org/linux-mm/1606098529-7907-1-git-send-email-anshuman.khand...@arm.com/ Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Catalin Marinas Cc: Will Deacon Cc: Ard Biesheuvel Cc: Mark Rutland Cc: David Hildenbrand Cc: Andrew Morton Cc: linux-arm-ker...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Anshuman Khandual (3): mm/hotplug: Prevalidate the address range being added with platform arm64/mm: Define arch_get_mappable_range() s390/mm: Define arch_get_mappable_range() arch/arm64/mm/mmu.c| 14 +++ arch/s390/mm/extmem.c | 5 +++ arch/s390/mm/vmem.c| 13 -- include/linux/memory_hotplug.h | 2 + mm/memory_hotplug.c| 77 +- mm/memremap.c | 6 ++- 6 files changed, 84 insertions(+), 33 deletions(-) -- 2.20.1
[RFC V2 1/3] mm/hotplug: Prevalidate the address range being added with platform
This introduces memhp_range_allowed() which can be called in various memory hotplug paths to prevalidate the address range which is being added, with the platform. Then memhp_range_allowed() calls memhp_get_pluggable_range() which provides applicable address range depending on whether linear mapping is required or not. For ranges that require linear mapping, it calls a new arch callback arch_get_mappable_range() which the platform can override. So the new callback, in turn provides the platform an opportunity to configure acceptable memory hotplug address ranges in case there are constraints. This mechanism will help prevent platform specific errors deep down during hotplug calls. This drops now redundant check_hotplug_memory_addressable() check in __add_pages(). Cc: David Hildenbrand Cc: Andrew Morton Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- include/linux/memory_hotplug.h | 2 + mm/memory_hotplug.c| 77 +- mm/memremap.c | 6 ++- 3 files changed, 64 insertions(+), 21 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 551093b74596..047a711ab76a 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -70,6 +70,8 @@ typedef int __bitwise mhp_t; */ #define MEMHP_MERGE_RESOURCE ((__force mhp_t)BIT(0)) +bool memhp_range_allowed(u64 start, u64 size, bool need_mapping); + /* * Extended parameters for memory hotplug: * altmap: alternative allocator for memmap array (optional) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 63b2e46b6555..9dd9db01985d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -107,6 +107,9 @@ static struct resource *register_memory_resource(u64 start, u64 size, if (strcmp(resource_name, "System RAM")) flags |= IORESOURCE_SYSRAM_DRIVER_MANAGED; + if (!memhp_range_allowed(start, size, 1)) + return ERR_PTR(-E2BIG); + /* * Make sure value parsed from 'mem=' only restricts memory adding * while booting, so that memory hotplug won't be impacted. Please @@ -284,22 +287,6 @@ static int check_pfn_span(unsigned long pfn, unsigned long nr_pages, return 0; } -static int check_hotplug_memory_addressable(unsigned long pfn, - unsigned long nr_pages) -{ - const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1; - - if (max_addr >> MAX_PHYSMEM_BITS) { - const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1; - WARN(1, -"Hotplugged memory exceeds maximum addressable address, range=%#llx-%#llx, maximum=%#llx\n", -(u64)PFN_PHYS(pfn), max_addr, max_allowed); - return -E2BIG; - } - - return 0; -} - /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will @@ -317,10 +304,6 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, if (WARN_ON_ONCE(!params->pgprot.pgprot)) return -EINVAL; - err = check_hotplug_memory_addressable(pfn, nr_pages); - if (err) - return err; - if (altmap) { /* * Validate altmap is within bounds of the total request @@ -1824,4 +1807,58 @@ int offline_and_remove_memory(int nid, u64 start, u64 size) return rc; } EXPORT_SYMBOL_GPL(offline_and_remove_memory); + +/* + * Platforms should define arch_get_mappable_range() that provides + * maximum possible addressable physical memory range for which the + * linear mapping could be created. The platform returned address + * range must adhere to these following semantics. + * + * - range.start <= range.end + * - Range includes both end points [range.start..range.end] + * + * There is also a fallback definition provided here, allowing the + * entire possible physical address range in case any platform does + * not define arch_get_mappable_range(). + */ +struct range __weak arch_get_mappable_range(void) +{ + struct range memhp_range = { + .start = 0UL, + .end = -1ULL, + }; + return memhp_range; +} + +static inline struct range memhp_get_pluggable_range(bool need_mapping) +{ + const u64 max_phys = (1ULL << (MAX_PHYSMEM_BITS + 1)) - 1; + struct range memhp_range; + + if (need_mapping) { + memhp_range = arch_get_mappable_range(); + if (memhp_range.start > max_phys) { + memhp_range.start = 0; + memhp_range.end = 0; + } + memhp_range.end = min_t(u64, memhp_range.end, max_phys); + } else { + memhp_range.start = 0; + memhp_range.end = max_phys; + } +
[RFC V2 3/3] s390/mm: Define arch_get_mappable_range()
This overrides arch_get_mappabble_range() on s390 platform and drops now redundant similar check in vmem_add_mapping(). This compensates by adding a new check __segment_load() to preserve the existing functionality. Cc: Heiko Carstens Cc: Vasily Gorbik Cc: David Hildenbrand Cc: linux-s...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/s390/mm/extmem.c | 5 + arch/s390/mm/vmem.c | 13 + 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c index 5060956b8e7d..cc055a78f7b6 100644 --- a/arch/s390/mm/extmem.c +++ b/arch/s390/mm/extmem.c @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long goto out_free_resource; } + if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) { + rc = -ERANGE; + goto out_resource; + } + rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1); if (rc) goto out_resource; diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index b239f2ba93b0..06dddcc0ce06 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size) mutex_unlock(&vmem_mutex); } +struct range arch_get_mappable_range(void) +{ + struct range memhp_range; + + memhp_range.start = 0; + memhp_range.end = VMEM_MAX_PHYS; + return memhp_range; +} + int vmem_add_mapping(unsigned long start, unsigned long size) { int ret; - if (start + size > VMEM_MAX_PHYS || - start + size < start) - return -ERANGE; - mutex_lock(&vmem_mutex); ret = vmem_add_range(start, size); if (ret) -- 2.20.1
[RFC V2 2/3] arm64/mm: Define arch_get_mappable_range()
This overrides arch_get_mappable_range() on arm64 platform which will be used with recently added generic framework. It drops inside_linear_region() and subsequent check in arch_add_memory() which are no longer required. Cc: Catalin Marinas Cc: Will Deacon Cc: Ard Biesheuvel Cc: Mark Rutland Cc: David Hildenbrand Cc: linux-arm-ker...@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/mm/mmu.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index ca692a815731..49ec8f2838f2 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1444,16 +1444,19 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) free_empty_tables(start, end, PAGE_OFFSET, PAGE_END); } -static bool inside_linear_region(u64 start, u64 size) +struct range arch_get_mappable_range(void) { + struct range memhp_range; + /* * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)] * accommodating both its ends but excluding PAGE_END. Max physical * range which can be mapped inside this linear mapping range, must * also be derived from its end points. */ - return start >= __pa(_PAGE_OFFSET(vabits_actual)) && - (start + size - 1) <= __pa(PAGE_END - 1); + memhp_range.start = __pa(_PAGE_OFFSET(vabits_actual)); + memhp_range.end = __pa(PAGE_END - 1); + return memhp_range; } int arch_add_memory(int nid, u64 start, u64 size, @@ -1461,11 +1464,6 @@ int arch_add_memory(int nid, u64 start, u64 size, { int ret, flags = 0; - if (!inside_linear_region(start, size)) { - pr_err("[%llx %llx] is outside linear mapping region\n", start, start + size); - return -EINVAL; - } - if (rodata_full || debug_pagealloc_enabled()) flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; -- 2.20.1
Re: [PATCH 1/2] mm/debug_vm_pgtable/basic: Add validation for dirtiness after write protect
On 11/27/20 3:14 PM, Catalin Marinas wrote: > On Fri, Nov 27, 2020 at 09:22:24AM +0100, Christophe Leroy wrote: >> Le 27/11/2020 à 06:06, Anshuman Khandual a écrit : >>> This adds validation tests for dirtiness after write protect conversion for >>> each page table level. This is important for platforms such as arm64 that >>> removes the hardware dirty bit while making it an write protected one. This >>> also fixes pxx_wrprotect() related typos in the documentation file. >> >>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c >>> index c05d9dcf7891..a5be11210597 100644 >>> --- a/mm/debug_vm_pgtable.c >>> +++ b/mm/debug_vm_pgtable.c >>> @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, >>> pgprot_t prot) >>> WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte; >>> WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte; >>> WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte; >>> + WARN_ON(pte_dirty(pte_wrprotect(pte))); >> >> Wondering what you are testing here exactly. >> >> Do you expect that if PTE has the dirty bit, it gets cleared by >> pte_wrprotect() ? >> >> Powerpc doesn't do that, it only clears the RW bit but the dirty bit remains >> if it is set, until you call pte_mkclean() explicitely. > > Arm64 has an unusual way of setting a hardware dirty "bit", it actually > clears the PTE_RDONLY bit. The pte_wrprotect() sets the PTE_RDONLY bit > back and we can lose the dirty information. Will found this and posted > patches to fix the arm64 pte_wprotect() to set a software PTE_DIRTY if > !PTE_RDONLY (we do this for ptep_set_wrprotect() already). My concern > was that we may inadvertently make a fresh/clean pte dirty with such > change, hence the suggestion for the test. > > That said, I think we also need a test in the other direction, > pte_wrprotect() should preserve any dirty information: > > WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte; This seems like a generic enough principle which all platforms should adhere to. But the proposed test WARN_ON(pte_dirty(pte_wrprotect(pte))) might fail on some platforms if the page table entry came in as a dirty one and pte_wrprotect() is not expected to alter the dirty state. Instead, should we just add the following two tests, which would ensure that pte_wrprotect() never alters the dirty state of a page table entry. WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte; WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte; > > If pte_mkwrite() makes a pte truly writable and potentially dirty, we > could also add a test as below. However, I think that's valid for arm64, > other architectures with a separate hardware dirty bit would fail this: > > WARN_ON(!pte_dirty(pte_wrprotect(pte_mkwrite(pte; Right.
[RFC 0/3] mm/hotplug: Pre-validate the address range with platform
This series adds a mechanism allowing platforms to weigh in and prevalidate incoming address range before proceeding further with the memory hotplug. This helps prevent potential platform errors for the given address range, down the hotplug call chain, which inevitably fails the hotplug itself. This mechanism was suggested by David Hildenbrand during another discussion with respect to a memory hotplug fix on arm64 platform. https://lore.kernel.org/linux-arm-kernel/1600332402-30123-1-git-send-email-anshuman.khand...@arm.com/ This mechanism focuses on the addressibility aspect and not [sub] section alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span() have been left unchanged. Wondering if all these can still be unified in an expanded memhp_range_allowed() check, that can be called from multiple memory hot add and remove paths. This series applies on v5.10-rc5 and has been slightly tested on arm64. But looking for some early feedback here. Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Catalin Marinas Cc: Will Deacon Cc: Ard Biesheuvel Cc: Mark Rutland Cc: David Hildenbrand Cc: Andrew Morton Cc: linux-arm-ker...@lists.infradead.org Cc: linux-s...@vger.kernel.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Anshuman Khandual (3): mm/hotplug: Pre-validate the address range with platform arm64/mm: Define arch_get_addressable_range() s390/mm: Define arch_get_addressable_range() arch/arm64/include/asm/memory.h | 3 ++ arch/arm64/mm/mmu.c | 19 ++-- arch/s390/include/asm/mmu.h | 2 ++ arch/s390/mm/vmem.c | 16 --- include/linux/memory_hotplug.h | 51 + mm/memory_hotplug.c | 29 ++- mm/memremap.c | 9 +- 7 files changed, 96 insertions(+), 33 deletions(-) -- 2.20.1
[RFC 3/3] s390/mm: Define arch_get_addressable_range()
This overrides arch_get_addressable_range() on s390 platform and drops now redudant similar check in vmem_add_mapping(). Cc: Heiko Carstens Cc: Vasily Gorbik Cc: David Hildenbrand Cc: linux-s...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/s390/include/asm/mmu.h | 2 ++ arch/s390/mm/vmem.c | 16 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h index e12ff0f29d1a..f92d3926b188 100644 --- a/arch/s390/include/asm/mmu.h +++ b/arch/s390/include/asm/mmu.h @@ -55,4 +55,6 @@ static inline int tprot(unsigned long addr) return rc; } +#define arch_get_addressable_range arch_get_addressable_range +struct range arch_get_addressable_range(bool need_mapping); #endif diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index b239f2ba93b0..e03ad0ed13a7 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -532,14 +532,22 @@ void vmem_remove_mapping(unsigned long start, unsigned long size) mutex_unlock(&vmem_mutex); } +struct range arch_get_addressable_range(bool need_mapping) +{ + struct range memhp_range; + + memhp_range.start = 0; + if (need_mapping) + memhp_range.end = VMEM_MAX_PHYS; + else + memhp_range.end = (1ULL << (MAX_PHYSMEM_BITS + 1)) - 1; + return memhp_range; +} + int vmem_add_mapping(unsigned long start, unsigned long size) { int ret; - if (start + size > VMEM_MAX_PHYS || - start + size < start) - return -ERANGE; - mutex_lock(&vmem_mutex); ret = vmem_add_range(start, size); if (ret) -- 2.20.1
[RFC 2/3] arm64/mm: Define arch_get_addressable_range()
This overrides arch_get_addressable_range() on arm64 platform which will be used with recently added generic framework. It drops inside_linear_region() and subsequent check in arch_add_memory() which are no longer required. Cc: Catalin Marinas Cc: Will Deacon Cc: Ard Biesheuvel Cc: Mark Rutland Cc: David Hildenbrand Cc: linux-arm-ker...@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/include/asm/memory.h | 3 +++ arch/arm64/mm/mmu.c | 19 +++ 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index cd61239bae8c..0ef7948eb58c 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -328,6 +328,9 @@ static inline void *phys_to_virt(phys_addr_t x) }) void dump_mem_limit(void); + +#define arch_get_addressable_range arch_get_addressable_range +struct range arch_get_addressable_range(bool need_mapping); #endif /* !ASSEMBLY */ /* diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index ca692a815731..a6433caf337f 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1444,16 +1444,24 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) free_empty_tables(start, end, PAGE_OFFSET, PAGE_END); } -static bool inside_linear_region(u64 start, u64 size) +struct range arch_get_addressable_range(bool need_mapping) { + struct range memhp_range; + /* * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)] * accommodating both its ends but excluding PAGE_END. Max physical * range which can be mapped inside this linear mapping range, must * also be derived from its end points. */ - return start >= __pa(_PAGE_OFFSET(vabits_actual)) && - (start + size - 1) <= __pa(PAGE_END - 1); + if (need_mapping) { + memhp_range.start = __pa(_PAGE_OFFSET(vabits_actual)); + memhp_range.end = __pa(PAGE_END - 1); + } else { + memhp_range.start = 0; + memhp_range.end = (1ULL << (MAX_PHYSMEM_BITS + 1)) - 1; + } + return memhp_range; } int arch_add_memory(int nid, u64 start, u64 size, @@ -1461,11 +1469,6 @@ int arch_add_memory(int nid, u64 start, u64 size, { int ret, flags = 0; - if (!inside_linear_region(start, size)) { - pr_err("[%llx %llx] is outside linear mapping region\n", start, start + size); - return -EINVAL; - } - if (rodata_full || debug_pagealloc_enabled()) flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; -- 2.20.1
[RFC 1/3] mm/hotplug: Pre-validate the address range with platform
This introduces memhp_range_allowed() which gets called in various hotplug paths. Then memhp_range_allowed() calls arch_get_addressable_range() via memhp_get_pluggable_range(). This callback can be defined by the platform to provide addressable physical range, depending on whether kernel linear mapping is required or not. This mechanism will prevent platform specific errors deep down during hotplug calls. While here, this drops now redundant check_hotplug_memory_addressable() check in __add_pages(). Cc: David Hildenbrand Cc: Andrew Morton Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Suggested-by: David Hildenbrand Signed-off-by: Anshuman Khandual --- include/linux/memory_hotplug.h | 51 ++ mm/memory_hotplug.c| 29 ++- mm/memremap.c | 9 +- 3 files changed, 68 insertions(+), 21 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 551093b74596..2018c0201672 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -6,6 +6,7 @@ #include #include #include +#include struct page; struct zone; @@ -70,6 +71,56 @@ typedef int __bitwise mhp_t; */ #define MEMHP_MERGE_RESOURCE ((__force mhp_t)BIT(0)) +/* + * Platforms should define arch_get_addressable_range() which provides + * addressable physical memory range depending upon whether the linear + * mapping is required or not. Returned address range must follow + * + * 1. range.start <= range.end + * 1. Must include end both points i.e range.start and range.end + * + * Nonetheless there is a fallback definition provided here providing + * maximum possible addressable physical range, irrespective of the + * linear mapping requirements. + */ +#ifndef arch_get_addressable_range +static inline struct range arch_get_addressable_range(bool need_mapping) +{ + struct range memhp_range = { + .start = 0UL, + .end = -1ULL, + }; + return memhp_range; +} +#endif + +static inline struct range memhp_get_pluggable_range(bool need_mapping) +{ + const u64 max_phys = (1ULL << (MAX_PHYSMEM_BITS + 1)) - 1; + struct range memhp_range = arch_get_addressable_range(need_mapping); + + if (memhp_range.start > max_phys) { + memhp_range.start = 0; + memhp_range.end = 0; + } + memhp_range.end = min_t(u64, memhp_range.end, max_phys); + return memhp_range; +} + +static inline bool memhp_range_allowed(u64 start, u64 size, bool need_mapping) +{ + struct range memhp_range = memhp_get_pluggable_range(need_mapping); + u64 end = start + size; + + if ((start < end) && (start >= memhp_range.start) && + ((end - 1) <= memhp_range.end)) + return true; + + WARN(1, "Hotplug memory [%#llx-%#llx] exceeds maximum addressable range [%#llx-%#llx]\n", + start, end, memhp_range.start, memhp_range.end); + return false; +} + /* * Extended parameters for memory hotplug: * altmap: alternative allocator for memmap array (optional) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 63b2e46b6555..9efb6c8558ed 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -284,22 +284,6 @@ static int check_pfn_span(unsigned long pfn, unsigned long nr_pages, return 0; } -static int check_hotplug_memory_addressable(unsigned long pfn, - unsigned long nr_pages) -{ - const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1; - - if (max_addr >> MAX_PHYSMEM_BITS) { - const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1; - WARN(1, -"Hotplugged memory exceeds maximum addressable address, range=%#llx-%#llx, maximum=%#llx\n", -(u64)PFN_PHYS(pfn), max_addr, max_allowed); - return -E2BIG; - } - - return 0; -} - /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will @@ -317,10 +301,6 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, if (WARN_ON_ONCE(!params->pgprot.pgprot)) return -EINVAL; - err = check_hotplug_memory_addressable(pfn, nr_pages); - if (err) - return err; - if (altmap) { /* * Validate altmap is within bounds of the total request @@ -1109,6 +1089,9 @@ int __ref __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags) struct resource *res; int ret; + if (!memhp_range_allowed(start, size, 1)) + return -ERANGE; + res = register_memory_resource(start, size, "System RAM"); if (IS_ERR(res)) return PTR_ERR(res); @@ -1123,6 +1106,9 @@ int add_memory(int nid, u64 start,
Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
Hello Tingwei, On 11/14/20 10:47 AM, Tingwei Zhang wrote: > Hi Anshuman, > > On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote: >> This series enables future IP trace features Embedded Trace Extension (ETE) >> and Trace Buffer Extension (TRBE). This series depends on the ETM system >> register instruction support series [0] and the v8.4 Self hosted tracing >> support series (Jonathan Zhou) [1]. The tree is available here [2] for >> quick access. >> >> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture >> extensions. ETE overlaps with the ETMv4 architecture, with additions to >> support the newer architecture features and some restrictions on the >> supported features w.r.t ETMv4. The ETE support is added by extending the >> ETMv4 driver to recognise the ETE and handle the features as exposed by the >> TRCIDRx registers. ETE only supports system instructions access from the >> host CPU. The ETE could be integrated with a TRBE (see below), or with the >> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware >> description as the ETMs and requires a node per instance. >> >> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is >> accessible via the system registers and can be combined with the ETE to >> provide a 1x1 configuration of source & sink. TRBE is being represented >> here as a CoreSight sink. Primary reason is that the ETE source could work >> with other traditional CoreSight sink devices. As TRBE captures the trace >> data which is produced by ETE, it cannot work alone. >> >> TRBE representation here have some distinct deviations from a traditional >> CoreSight sink device. Coresight path between ETE and TRBE are not built >> during boot looking at respective DT or ACPI entries. Instead TRBE gets >> checked on each available CPU, when found gets connected with respective >> ETE source device on the same CPU, after altering its outward connections. >> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE >> coupling/decoupling method implemented here is not optimal and would be >> reworked later on. > Only perf mode is supported in TRBE in current path. Will you consider > support sysfs mode as well in following patch sets? Yes, either in subsequent versions or later on, after first getting the perf based functionality enabled. Nonetheless, sysfs is also on the todo list as mentioned in the cover letter. - Anshuman
Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE
Hello Mike, On 11/16/20 8:30 PM, Mike Leach wrote: > Hi Anshuman, > > I've not looked in detail at this set yet, but having skimmed through > it I do have an initial question about the handling of wrapped data > buffers. > > With the ETR/ETB we found an issue with the way perf concatenated data > captured from the hardware buffer into a single contiguous data > block. The issue occurs when a wrapped buffer appears after another > buffer in the data file. In a typical session perf would stop trace > and copy the hardware buffer multiple times into the auxtrace buffer. The hardware buffer and perf aux trace buffer are the same for TRBE and hence there is no actual copy involved. Trace data gets pushed into the user space via perf_aux_output_end() either via etm_event_stop() or via the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space happens via updates to perf aux buffer indices i.e head, tail, wake up. But logically, they will appear as a stream of records to the user space while parsing perf.data file. > > e.g. > > For ETR/ETB we have a fixed length hardware data buffer - and no way > of detecting buffer wraps using interrupts as the tracing is in > progress. TRBE has an interrupt. Hence there will be an opportunity to insert any additional packets if required to demarcate pre and post IRQ trace data streams. > > If the buffer is not full at the point that perf transfers it then the > data will look like this:- > 1) > easy to decode, we can see the async at the start of the data - which > would be the async issued at the start of trace. Just curious, what makes the tracer to generate the trace packet. Is there an explicit instruction or that is how the tracer starts when enabled ? > > If the buffer wraps we see this:- > > 2) > > Again no real issue, the decoder will skip to the async and trace from > there - we lose the unsynced data. Could you please elaborate more on the difference between sync and async trace data ? > > Now the problem occurs when multiple transfers of data occur. We can > see the following appearing as contiguous trace in the auxtrace > buffer:- > > 3) < async> So there is an wrap around event between and ? Are there any other situations where this might happen ? > > Now the decoder cannot spot the point that the synced data from the > first capture ends, and the unsynced data from the second capture > begins. Got it. > This means it will continue to decode into the unsynced data - which > will result in incorrect trace / outright errors. To get round this > for ETR/ETB the driver will insert barrier packets into the datafile > if a wrap event is detected. But you mentioned there are on IRQs on ETR/ETB. So how the wrap event is even detected ? > > 4) data> > > This has the effect of resetting the decoder into the > unsynced state so that the invalid trace is not decoded. This is a > workaround we have to do to handle the limitations of the ETR / ETB > trace hardware. Got it. > > For TRBE we do have interrupts, so it should be possible to prevent > the buffer wrapping in most cases - but I did see in the code that > there are handlers for the TRBE buffer wrap management event. Are > there other factors in play that will prevent data pattern 3) from > appearing in the auxtrace buffer ? On TRBE, the buffer wrapping cannot happen without generating an IRQ. I would assume that ETE will then start again with an data packet first when the handler returns. Otherwise we might also have to insert a similar barrier packet for the user space tool to reset. As trace data should not get lost during an wrap event, ETE should complete the packet after the handler returns, hence aux buffer should still have logically contiguous stream of to decode. I am not sure right now, but will look into this. - Anshuman
Re: [RFC 07/11] coresight: sink: Add TRBE driver
On 11/14/20 11:08 AM, Tingwei Zhang wrote: > Hi Anshuman, > > On Tue, Nov 10, 2020 at 08:45:05PM +0800, Anshuman Khandual wrote: >> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is >> accessible via the system registers. The TRBE supports different addressing >> modes including CPU virtual address and buffer modes including the circular >> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1), >> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the >> access to the trace buffer could be prohibited by a higher exception level >> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU >> private interrupt (PPI) on address translation errors and when the buffer >> is full. Overall implementation here is inspired from the Arm SPE driver. >> >> Signed-off-by: Anshuman Khandual >> --- >> Documentation/trace/coresight/coresight-trbe.rst | 36 ++ >> arch/arm64/include/asm/sysreg.h | 2 + >> drivers/hwtracing/coresight/Kconfig | 11 + >> drivers/hwtracing/coresight/Makefile | 1 + >> drivers/hwtracing/coresight/coresight-trbe.c | 766 >> +++ >> drivers/hwtracing/coresight/coresight-trbe.h | 525 >> 6 files changed, 1341 insertions(+) >> create mode 100644 Documentation/trace/coresight/coresight-trbe.rst >> create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c >> create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h >> >> diff --git a/Documentation/trace/coresight/coresight-trbe.rst >> b/Documentation/trace/coresight/coresight-trbe.rst >> new file mode 100644 >> index 000..4320a8b >> --- /dev/null >> +++ b/Documentation/trace/coresight/coresight-trbe.rst >> @@ -0,0 +1,36 @@ >> +.. SPDX-License-Identifier: GPL-2.0 >> + >> +== >> +Trace Buffer Extension (TRBE). >> +== >> + >> +:Author: Anshuman Khandual >> +:Date: November 2020 >> + >> +Hardware Description >> + >> + >> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system >> +memory, CPU traces generated from a corresponding percpu tracing unit. This >> +gets plugged in as a coresight sink device because the corresponding trace >> +genarators (ETE), are plugged in as source device. >> + >> +Sysfs files and directories >> +--- >> + >> +The TRBE devices appear on the existing coresight bus alongside the other >> +coresight devices:: >> + >> +>$ ls /sys/bus/coresight/devices >> +trbe0 trbe1 trbe2 trbe3 >> + >> +The ``trbe`` named TRBEs are associated with a CPU.:: >> + >> +>$ ls /sys/bus/coresight/devices/trbe0/ >> +irq align dbm >> + >> +*Key file items are:-* >> + * ``irq``: TRBE maintenance interrupt number >> + * ``align``: TRBE write pointer alignment >> + * ``dbm``: TRBE updates memory with access and dirty flags >> + >> diff --git a/arch/arm64/include/asm/sysreg.h >> b/arch/arm64/include/asm/sysreg.h >> index 14cb156..61136f6 100644 >> --- a/arch/arm64/include/asm/sysreg.h >> +++ b/arch/arm64/include/asm/sysreg.h >> @@ -97,6 +97,7 @@ >> #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | >> ((!!x) << >> PSTATE_Imm_shift)) >> #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | >> ((!!x) >> << PSTATE_Imm_shift)) >> #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | >> ((!!x) << >> PSTATE_Imm_shift)) >> +#define TSB_CSYNC __emit_inst(0xd503225f) >> >> #define __SYS_BARRIER_INSN(CRm, op2, Rt) \ >> __emit_inst(0xd500 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & >> 0x1f)) >> @@ -865,6 +866,7 @@ >> #define ID_AA64MMFR2_CNP_SHIFT 0 >> >> /* id_aa64dfr0 */ >> +#define ID_AA64DFR0_TRBE_SHIFT 44 >> #define ID_AA64DFR0_TRACE_FILT_SHIFT40 >> #define ID_AA64DFR0_DOUBLELOCK_SHIFT36 >> #define ID_AA64DFR0_PMSVER_SHIFT32 >> diff --git a/drivers/hwtracing/coresight/Kconfig >> b/drivers/hwtracing/coresight/Kconfig >> index c119824..0f5e101 100644 >> --- a/drivers/hwtracing/coresight/Kconfig >> +++ b/drivers/hwtracing/coresight/Kconfig >> @@ -156,6 +156,17 @@ config CORESIGHT_CTI >>To compile this driver as a module, choose M here: the >>module will be called coresight-cti. >> >> +config CORESIGHT_TRBE >> +bool "Trace Buffer Extension (TRBE) driver" > > Can you consider to support TRBE as loadable module since all coresight > drivers support loadable module now. Reworking the TRBE driver and making it a loadable module is part of it. - Anshuman
Re: [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source
On 11/12/20 3:01 PM, Suzuki K Poulose wrote: > Hi Anshuman, > On 11/10/20 12:45 PM, Anshuman Khandual wrote: >> Unlike traditional sink devices, individual TRBE instances are not detected >> via DT or ACPI nodes. Instead TRBE instances are detected during CPU online >> process. Hence a path connecting ETE and TRBE on a given CPU would not have >> been established until then. This adds two coresight helpers that will help >> modify outward connections from a source device to establish and terminate >> path to a given sink device. But this method might not be optimal and would >> be reworked later. >> >> Signed-off-by: Anshuman Khandual > > Instead of this, could we come up something like a percpu_sink concept ? That > way, the TRBE driver could register the percpu_sink for the corresponding CPU > and we don't have to worry about the order in which the ETE will be probed > on a hotplugged CPU. (i.e, if the TRBE is probed before the ETE, the following > approach would fail to register the sink). Right, it wont work. We already have a per cpu csdev sink. The current mechanism expects all ETEs to have been established and the TRBEs just get plugged in during their init while probing each individual cpus. During cpu hotplug in or out, a TRBE-ETE link either gets created and destroyed. But it assumes that an ETE is always present for TRBE to get plugged into or teared down from. csdev for TRBE sink too gets released during cpu hot remove path. Are you suggesting that there should be a percpu static csdev array defined for potential all TRBEs so that the ETE-TRBE links be permanently established given that the ETEs are permanent and never really go away with cpu hot remove event (my assumption). TRBE csdevs should just get enabled or disabled without really being destroyed during cpu hotplug, so that the corresponding TRBE-ETE connection remains in place. > > And the default sink can be initialized when the ETE instance first starts > looking for it. IIUC def_sink is the sink which will be selected by default for a source device while creating a path, in case there is no clear preference from the user. ETE's default sink should be fixed (TRBE) to be on the easy side and hence assigning that during connection expansion procedure, does make sense. But then it can be more complex where the 'default' sink for an ETE can be scenario specific and may not be always be its TRBE. The expanding connections fits into a scenario where the ETE is present with all it's other traditional sinks and TRBE is the one which comes in or goes out with the cpu. If ETE also comes in and goes out with individual cpu hotplug which is preferred ideally, we would need to also 1. Co-ordinate with TRBE bring up and connection creation to avoid race 2. Rediscover traditional sinks which were attached to the ETE before - go back, rescan the DT/ACPI entries for sinks with whom a path can be established etc. Basically there are three choices we have here 1. ETE is permanent, TRBE and ETE-TRBE path gets created or destroyed with hotplug (current proposal) 2. ETE/TRBE/ETE-TRBE path are all permanent, ETE and TRBE get enabled or disabled with hotplug 3. ETE, TRBE and ETE-TRBE path, all get created, enabled and destroyed with hotplug in sync - Anshuman
Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data
On 11/12/20 2:57 PM, Suzuki K Poulose wrote: > On 11/10/20 12:45 PM, Anshuman Khandual wrote: >> perf handle structure needs to be shared with the TRBE IRQ handler for >> capturing trace data and restarting the handle. There is a probability >> of an undefined reference based crash when etm event is being stopped >> while a TRBE IRQ also getting processed. This happens due the release >> of perf handle via perf_aux_output_end(). This stops the sinks via the >> link before releasing the handle, which will ensure that a simultaneous >> TRBE IRQ could not happen. >> >> Signed-off-by: Anshuman Khandual >> --- >> This might cause problem with traditional sink devices which can be >> operated in both sysfs and perf mode. This needs to be addressed >> correctly. One option would be to move the update_buffer callback >> into the respective sink devices. e.g, disable(). >> >> drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c >> b/drivers/hwtracing/coresight/coresight-etm-perf.c >> index 534e205..1a37991 100644 >> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c >> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c >> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int >> mode) >> size = sink_ops(sink)->update_buffer(sink, handle, >> event_data->snk_config); >> + coresight_disable_path(path); >> perf_aux_output_end(handle, size); >> + return; >> } > > As you mentioned, this is not ideal where another session could be triggered > on > the sink from a different ETM (not for per-CPU sink) in a different mode > before > you collect the buffer. I believe the best option is to leave the > update_buffer() to disable_hw. This would need to pass on the "handle" to the > disable_path. Passing 'handle' into coresight_ops_sink->disable() would enable pushing updated trace data into perf aux buffer. But do you propose to drop the update_buffer() call back completely or just move it into disable() call back (along with PERF_EF_UPDATE mode check) for all individual sinks for now. May be, later it can be dropped off completely. > > That way the races can be handled inside the sinks. Also, this aligns the > perf mode of the sinks with that of the sysfs mode. Did not get that, could you please elaborate ?
Re: [PATCH RFC/RFT v3 6/9] powerpc: move cacheinfo sysfs to generic cacheinfo infrastructure
On 03/10/2014 04:42 PM, Sudeep Holla wrote: > Hi Anshuman, > > On 07/03/14 06:14, Anshuman Khandual wrote: >> On 03/07/2014 09:36 AM, Anshuman Khandual wrote: >>> On 02/19/2014 09:36 PM, Sudeep Holla wrote: >>>> From: Sudeep Holla >>>> >>>> This patch removes the redundant sysfs cacheinfo code by making use of >>>> the newly introduced generic cacheinfo infrastructure. >>>> >>>> Signed-off-by: Sudeep Holla >>>> Cc: Benjamin Herrenschmidt >>>> Cc: Paul Mackerras >>>> Cc: linuxppc-...@lists.ozlabs.org >>>> --- >>>> arch/powerpc/kernel/cacheinfo.c | 831 >>>> ++-- >>>> arch/powerpc/kernel/cacheinfo.h | 8 - >>>> arch/powerpc/kernel/sysfs.c | 4 - >>>> 3 files changed, 109 insertions(+), 734 deletions(-) >>>> delete mode 100644 arch/powerpc/kernel/cacheinfo.h >>>> >>>> diff --git a/arch/powerpc/kernel/cacheinfo.c >>>> b/arch/powerpc/kernel/cacheinfo.c >>>> index 2912b87..05b7580 100644 >>>> --- a/arch/powerpc/kernel/cacheinfo.c >>>> +++ b/arch/powerpc/kernel/cacheinfo.c >>>> @@ -10,38 +10,10 @@ >>>>* 2 as published by the Free Software Foundation. >>>>*/ >>>> >>>> +#include >>>> #include >>>> -#include >>>> #include >>>> -#include >>>> -#include >>>> -#include >>>> #include >>>> -#include >>>> -#include >>>> -#include >>>> - >>>> -#include "cacheinfo.h" >>>> - >>>> -/* per-cpu object for tracking: >>>> - * - a "cache" kobject for the top-level directory >>>> - * - a list of "index" objects representing the cpu's local cache >>>> hierarchy >>>> - */ >>>> -struct cache_dir { >>>> -struct kobject *kobj; /* bare (not embedded) kobject for cache >>>> - * directory */ >>>> -struct cache_index_dir *index; /* list of index objects */ >>>> -}; >>>> - >>>> -/* "index" object: each cpu's cache directory has an index >>>> - * subdirectory corresponding to a cache object associated with the >>>> - * cpu. This object's lifetime is managed via the embedded kobject. >>>> - */ >>>> -struct cache_index_dir { >>>> -struct kobject kobj; >>>> -struct cache_index_dir *next; /* next index in parent directory */ >>>> -struct cache *cache; >>>> -}; >>>> >>>> /* Template for determining which OF properties to query for a given >>>>* cache type */ >>>> @@ -60,11 +32,6 @@ struct cache_type_info { >>>> const char *nr_sets_prop; >>>> }; >>>> >>>> -/* These are used to index the cache_type_info array. */ >>>> -#define CACHE_TYPE_UNIFIED 0 >>>> -#define CACHE_TYPE_INSTRUCTION 1 >>>> -#define CACHE_TYPE_DATA2 >>>> - >>>> static const struct cache_type_info cache_type_info[] = { >>>> { >>>> /* PowerPC Processor binding says the [di]-cache-* >>>> @@ -77,246 +44,115 @@ static const struct cache_type_info >>>> cache_type_info[] = { >>>> .nr_sets_prop= "d-cache-sets", >>>> }, >>>> { >>>> -.name= "Instruction", >>>> -.size_prop = "i-cache-size", >>>> -.line_size_props = { "i-cache-line-size", >>>> - "i-cache-block-size", }, >>>> -.nr_sets_prop= "i-cache-sets", >>>> -}, >>>> -{ >>>> .name= "Data", >>>> .size_prop = "d-cache-size", >>>> .line_size_props = { "d-cache-line-size", >>>>"d-cache-block-size", }, >>>> .nr_sets_prop= "d-cache-sets", >>>> }, >>>> +{ >>>> +.name= "Instruction", >>>> +.size_prop = "i-cache-size", >>>> +.line_size_props = { "i-cache-line-size", >>>> +
Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
On 01/22/2014 07:02 AM, Michael Ellerman wrote: > On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote: >> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain >> performance counters: gpci ("get performance counter info") and 24x7. >> >> The counters supplied by these interfaces are continually counting and never >> need to be (and cannot be) disabled or enabled. They additionally do not >> generate any interrupts. This makes them in some regards similar to software >> counters, and as a result their implimentation shares some common code (which >> an initial patch exposes) with the sw counters. > > Hi Cody, > > Can you please add some more explanation of this series. > > In particular why do we need two new PMUs, and how do they relate to each > other? > > And can you add an example of how I'd actually use them using perf. > Yeah, agreed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] powerpc, ptrace: Add few more ptrace request macros
On 04/02/2014 06:13 AM, Michael Neuling wrote: > Anshuman Khandual wrote: >> > This patch adds few more ptrace request macros expanding >> > the existing capability. These ptrace requests macros can >> > be classified into two categories. > Why is this only an RFC? > Looking for comments, suggestions, concerns from people. But looks like its bit big a patch to review at once. > Also, please share the test case that you wrote for this. Will split the patch into multiple components, add the test case and send out again. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] powerpc, ptrace: Add new ptrace request macro for miscellaneous registers
This patch adds following new set of ptrace request macros for miscellaneous registers expanding the existing ptrace ABI on PowerPC. /* Miscellaneous registers */ PTRACE_GETMSCREGS PTRACE_SETMSCREGS Signed-off-by: Anshuman Khandual --- arch/powerpc/include/uapi/asm/ptrace.h | 10 arch/powerpc/kernel/ptrace.c | 91 +- 2 files changed, 100 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h index 1a12c36..bce1055 100644 --- a/arch/powerpc/include/uapi/asm/ptrace.h +++ b/arch/powerpc/include/uapi/asm/ptrace.h @@ -241,6 +241,16 @@ struct pt_regs { #define PTRACE_GETTM_CVMXREGS 0x76 #define PTRACE_SETTM_CVMXREGS 0x77 +/* Miscellaneous registers */ +#define PTRACE_GETMSCREGS 0x78 +#define PTRACE_SETMSCREGS 0x79 + +/* + * XXX: A note to application developers. The existing data layout + * of the above four ptrace requests can change when new registers + * are available for each category in forthcoming processors. + */ + #ifndef __ASSEMBLY__ struct ppc_debug_info { diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index 9fbcb6a..2893958 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -1054,6 +1054,76 @@ static int tm_cvmx_set(struct task_struct *target, const struct user_regset *reg #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ /* + * Miscellaneous Registers + * + * struct { + * unsigned long dscr; + * unsigned long ppr; + * unsigned long tar; + * }; + */ +static int misc_get(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf) +{ + int ret; + + /* DSCR register */ + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.dscr, 0, + sizeof(unsigned long)); + + BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned long) + + sizeof(unsigned long) != offsetof(struct thread_struct, ppr)); + + /* PPR register */ + if (!ret) + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.ppr, sizeof(unsigned long), + 2 * sizeof(unsigned long)); + + BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long) + != offsetof(struct thread_struct, tar)); + /* TAR register */ + if (!ret) + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &target->thread.tar, 2 * sizeof(unsigned long), + 3 * sizeof(unsigned long)); + return ret; +} + +static int misc_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int ret; + + /* DSCR register */ + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.dscr, 0, + sizeof(unsigned long)); + + BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned long) + + sizeof(unsigned long) != offsetof(struct thread_struct, ppr)); + + /* PPR register */ + if (!ret) + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.ppr, sizeof(unsigned long), + 2 * sizeof(unsigned long)); + + BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long) + != offsetof(struct thread_struct, tar)); + + /* TAR register */ + if (!ret) + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &target->thread.tar, 2 * sizeof(unsigned long), + 3 * sizeof(unsigned long)); + return ret; +} + +/* * These are our native regset flavors. */ enum powerpc_regset { @@ -1072,8 +1142,9 @@ enum powerpc_regset { REGSET_TM_SPR, /* TM specific SPR */ REGSET_TM_CGPR, /* TM checkpointed GPR */ REGSET_TM_CFPR, /* TM checkpointed FPR */ - REGSET_TM_CVMX /* TM checkpointed VMX */ + REGSET_TM_CVMX, /* TM checkpointed VMX */ #endif + REGSET_MISC /* Miscellaneous */ }; static const struct user_regset
[PATCH 1/3] elf: Add some new PowerPC specifc note sections
This patch adds four new note sections for transactional memory and one note section for some miscellaneous registers. This addition of new elf note sections extends the existing elf ABI without affecting it in any manner. Signed-off-by: Anshuman Khandual --- include/uapi/linux/elf.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index ef6103b..bd59452 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -379,6 +379,11 @@ typedef struct elf64_shdr { #define NT_PPC_VMX 0x100 /* PowerPC Altivec/VMX registers */ #define NT_PPC_SPE 0x101 /* PowerPC SPE/EVR registers */ #define NT_PPC_VSX 0x102 /* PowerPC VSX registers */ +#define NT_PPC_TM_SPR 0x103 /* PowerPC transactional memory special registers */ +#define NT_PPC_TM_CGPR 0x104 /* PowerpC transactional memory checkpointed GPR */ +#define NT_PPC_TM_CFPR 0x105 /* PowerPC transactional memory checkpointed FPR */ +#define NT_PPC_TM_CVMX 0x106 /* PowerPC transactional memory checkpointed VMX */ +#define NT_PPC_MISC0x107 /* PowerPC miscellaneous registers */ #define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */ #define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */ #define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] powerpc, ptrace: Add new ptrace request macros for transactional memory
This patch adds following new sets of ptrace request macros for transactional memory expanding the existing ptrace ABI on PowerPC. /* TM special purpose registers */ PTRACE_GETTM_SPRREGS PTRACE_SETTM_SPRREGS /* TM checkpointed GPR registers */ PTRACE_GETTM_CGPRREGS PTRACE_SETTM_CGPRREGS /* TM checkpointed FPR registers */ PTRACE_GETTM_CFPRREGS PTRACE_SETTM_CFPRREGS /* TM checkpointed VMX registers */ PTRACE_GETTM_CVMXREGS PTRACE_SETTM_CVMXREGS Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/switch_to.h | 8 + arch/powerpc/include/uapi/asm/ptrace.h | 51 +++ arch/powerpc/kernel/process.c | 24 ++ arch/powerpc/kernel/ptrace.c | 570 +++-- 4 files changed, 625 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 0e83e7d..22095e2 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct *t) } #endif +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +extern void flush_tmreg_to_thread(struct task_struct *); +#else +static inline void flush_tmreg_to_thread(struct task_struct *t) +{ +} +#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ + static inline void clear_task_ebb(struct task_struct *t) { #ifdef CONFIG_PPC_BOOK3S_64 diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h index 77d2ed3..1a12c36 100644 --- a/arch/powerpc/include/uapi/asm/ptrace.h +++ b/arch/powerpc/include/uapi/asm/ptrace.h @@ -190,6 +190,57 @@ struct pt_regs { #define PPC_PTRACE_SETHWDEBUG 0x88 #define PPC_PTRACE_DELHWDEBUG 0x87 +/* Transactional memory */ + +/* + * TM specific SPR + * + * struct data { + * u64 tm_tfhar; + * u64 tm_texasr; + * u64 tm_tfiar; + * unsigned long tm_orig_msr; + * u64 tm_tar; + * u64 tm_ppr; + * u64 tm_dscr; + * }; + */ +#define PTRACE_GETTM_SPRREGS 0x70 +#define PTRACE_SETTM_SPRREGS 0x71 + +/* + * TM Checkpointed GPR + * + * struct data { + * struct pt_regs ckpt_regs; + * }; + */ +#define PTRACE_GETTM_CGPRREGS 0x72 +#define PTRACE_SETTM_CGPRREGS 0x73 + +/* + * TM Checkpointed FPR + * + * struct data { + * u64 fpr[32]; + * u64 fpscr; + * }; + */ +#define PTRACE_GETTM_CFPRREGS 0x74 +#define PTRACE_SETTM_CFPRREGS 0x75 + +/* + * TM Checkpointed VMX + * + * struct data { + * vector128 vr[32]; + * vector128 vscr; + * unsigned long vrsave; + *}; + */ +#define PTRACE_GETTM_CVMXREGS 0x76 +#define PTRACE_SETTM_CVMXREGS 0x77 + #ifndef __ASSEMBLY__ struct ppc_debug_info { diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index af064d2..230a0ee 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -673,6 +673,30 @@ static inline void __switch_to_tm(struct task_struct *prev) } } +void flush_tmreg_to_thread(struct task_struct *tsk) +{ + /* +* If task is not current, it should have been flushed +* already to it's thread_struct during __switch_to(). +*/ + if (tsk != current) + return; + + preempt_disable(); + if (tsk->thread.regs) { + /* +* If we are still current, the TM state need to +* be flushed to thread_struct as it will be still +* present in the current cpu +*/ + if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) { + __switch_to_tm(tsk); + tm_recheckpoint_new_task(tsk); + } + } + preempt_enable(); +} + /* * This is called if we are on the way out to userspace and the * TIF_RESTORE_TM flag is set. It checks if we need to reload diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index 2e3d2bf..9fbcb6a 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const struct user_regset *regset, return ret; } +/* + * When any transaction is active, "thread_struct->transact_fp" holds + * the current running value of all FPR registers and "thread_struct-> + * fp_state" holds the last checkpointed FPR registers state for the + * current transaction. + * + * struct data { + * u64 fpr[32]; + * u64 fpscr; + * }; + */ static int fpr_get(struct task_struct *target, const struct user_regset *regset, unsigned int pos, unsigned int count, void *kbuf, void __user *ubuf) @@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const struct user_regset *regset, u64 buf[33]; i
[PATCH 0/3] Add new ptrace request macros on PowerPC
FPR[0]: %lx\n", fpr->fpr[0]); printf("TM RN FPR[1]: %lx\n", fpr->fpr[1]); printf("TM RN FPR[2]: %lx\n", fpr->fpr[2]); printf("TM RN FPSCR: %lx\n", fpr->fpscr); /* TM checkpointed FPR */ ret = ptrace(PTRACE_GETTM_CFPRREGS, child, NULL, fpr1); if (ret == -1) { printf("PTRACE_GETTM_CFPRREGS failed: %s\n", strerror(errno)); exit(-1); } printf("---TM checkpointed FPR-\n"); printf("TM CH FPR[0]: %lx\n", fpr1->fpr[0]); printf("TM CH FPR[1]: %lx\n", fpr1->fpr[1]); printf("TM CH FPR[2]: %lx\n", fpr1->fpr[2]); printf("TM CH FPSCR: %lx\n", fpr1->fpscr); /* Misc debug */ ret = ptrace(PTRACE_GETMSCREGS, child, NULL, dbr1); if (ret == -1) { printf("PTRACE_GETMSCREGS failed: %s\n", strerror(errno)); exit(-1); } printf("---Running miscellaneous registers---\n"); printf("TM RN DSCR: %lx\n", dbr1->dscr); printf("TM RN PPR: %lx\n", dbr1->ppr); printf("TM RN TAR: %lx\n", dbr1->tar); ret = ptrace(PTRACE_DETACH, child, NULL, NULL); if (ret == -1) { printf("PTRACE_DETACH failed: %s\n", strerror(errno)); exit(-1); } } while (0); } return 0; } = Anshuman Khandual (3): elf: Add some new PowerPC specifc note sections powerpc, ptrace: Add new ptrace request macros for transactional memory powerpc, ptrace: Add new ptrace request macro for miscellaneous registers arch/powerpc/include/asm/switch_to.h | 8 + arch/powerpc/include/uapi/asm/ptrace.h | 61 +++ arch/powerpc/kernel/process.c | 24 ++ arch/powerpc/kernel/ptrace.c | 659 +++-- include/uapi/linux/elf.h | 5 + 5 files changed, 729 insertions(+), 28 deletions(-) -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Add new ptrace request macros on PowerPC
On 04/02/2014 12:32 PM, Anshuman Khandual wrote: > This patch series adds new ELF note sections which are used to > create new ptrace request macros for various transactional memory and > miscellaneous registers on PowerPC. Please find the test case exploiting > the new ptrace request macros and it's results on a POWER8 system. > > RFC: https://lkml.org/lkml/2014/4/1/292 > > == Results == > ---TM specific SPR-- > TM TFHAR: 19dc > TM TEXASR: de01ac01 > TM TFIAR: c003f386 > TM CH ORIG_MSR: 9005f032 > TM CH TAR: 6 > TM CH PPR: c > TM CH DSCR: 1 > ---TM checkpointed GPR- > TM CH GPR[0]: 197c > TM CH GPR[1]: 5 > TM CH GPR[2]: 6 > TM CH GPR[7]: 1 > TM CH NIP: 19dc > TM CH LINK: 197c > TM CH CCR: 22000422 > ---TM running GPR- > TM RN GPR[0]: 197c > TM RN GPR[1]: 7 > TM RN GPR[2]: 8 > TM RN GPR[7]: 5 > TM RN NIP: 19fc > TM RN LINK: 197c > TM RN CCR: 2000422 > ---TM running FPR- > TM RN FPR[0]: 1002d3a3780 > TM RN FPR[1]: 7 > TM RN FPR[2]: 8 > TM RN FPSCR: 0 > ---TM checkpointed FPR- > TM CH FPR[0]: 1002d3a3780 > TM CH FPR[1]: 5 > TM CH FPR[2]: 6 > TM CH FPSCR: 0 > ---Running miscellaneous registers--- TM RN DSCR: 0 There is a problem in here which I forgot to mention. The running DSCR value comes from thread->dscr component of the target process. While we are inside the transaction (which is the case here as we are stuck at "b ." instruction and have not reached TEND) thread->dscr should have the running value of the DSCR register at that point of time. Here we expect the DSCR value to be 5 instead of 0 as shown in the output above. During the tests when I moved the "b ." after TEND, the thread->dscr gets the value of 5 while all check pointed reg values are thrown away. I believe there is some problem in the way thread->dscr context is saved away inside the TM section. Will look into this problem further and keep informed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V5 0/4] perf: New conditional branch filter
On 03/25/2014 09:46 PM, Andi Kleen wrote: >> Hey Arnaldo, >> >> Do you have any comments or suggestions on this ? Have not received any >> response on these proposed patch series yet. Thank you. > > I read it earlier and it looks good to me. Hey Andi, Can I add your "Reviewed-by" or "Acked-by" to all these four patches ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V3 01/10] perf: New conditional branch filter criteria in branch stack sampling
On 11/26/2013 11:36 AM, m...@ellerman.id.au wrote: > Ideally your commit subject would contain a verb, preferably in the present > tense. > > I think simply "perf: Add PERF_SAMPLE_BRANCH_COND" would be clearer. Sure, will change it. > > On Wed, 2013-16-10 at 06:56:48 UTC, Anshuman Khandual wrote: >> POWER8 PMU based BHRB supports filtering for conditional branches. >> This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which >> will extend the existing perf ABI. Other architectures can provide >> this functionality with either HW filtering support (if present) or >> with SW filtering of instructions. >> >> Signed-off-by: Anshuman Khandual >> Reviewed-by: Stephane Eranian >> --- >> include/uapi/linux/perf_event.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/include/uapi/linux/perf_event.h >> b/include/uapi/linux/perf_event.h >> index 0b1df41..5da52b6 100644 >> --- a/include/uapi/linux/perf_event.h >> +++ b/include/uapi/linux/perf_event.h >> @@ -160,8 +160,9 @@ enum perf_branch_sample_type { >> PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ >> PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */ >> PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */ >> +PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ >> >> -PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */ >> +PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ >> }; > > This no longer applies against Linus' tree, you'll need to rebase it. Okay -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V3 02/10] powerpc, perf: Enable conditional branch filter for POWER8
On 11/26/2013 11:36 AM, m...@ellerman.id.au wrote: > On Wed, 2013-16-10 at 06:56:49 UTC, Anshuman Khandual wrote: >> Enables conditional branch filter support for POWER8 >> utilizing MMCRA register based filter and also invalidates >> a BHRB branch filter combination involving conditional >> branches. >> >> Signed-off-by: Anshuman Khandual >> --- >> arch/powerpc/perf/power8-pmu.c | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c >> index 2ee4a70..6e28587 100644 >> --- a/arch/powerpc/perf/power8-pmu.c >> +++ b/arch/powerpc/perf/power8-pmu.c >> @@ -580,11 +580,21 @@ static u64 power8_bhrb_filter_map(u64 >> branch_sample_type) >> if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) >> return -1; >> >> +/* Invalid branch filter combination - HW does not support */ >> +if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) && >> +(branch_sample_type & PERF_SAMPLE_BRANCH_COND)) >> +return -1; > > What this doesn't make obvious is that the hardware doesn't support any > combinations. It just happens that these are the only two possibilities we > allow, and so this is the only combination we have to disallow. > >> >> if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { >> pmu_bhrb_filter |= POWER8_MMCRA_IFM1; >> return pmu_bhrb_filter; >> } >> >> +if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) { >> +pmu_bhrb_filter |= POWER8_MMCRA_IFM3; >> +return pmu_bhrb_filter; >> +} >> + >> /* Every thing else is unsupported */ >> return -1; >> } > > I think it would be clearer if we actually checked for the possibilities we > allow and let everything else fall through, eg: > > /* Ignore user/kernel/hv bits */ > branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL; > > if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY) > return 0; > > if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) > return POWER8_MMCRA_IFM1; > > if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) > return POWER8_MMCRA_IFM3; > > return -1; > Please look at the 9th patch (power8, perf: Change BHRB branch filter configuration). All these issues are taken care of in this patch. It clearly indicates that any combination of HW BHRB filters will not be supported in the PMU and hence zero out the HW filter component and processes all of those filters in the SW. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 04/10] x86, perf: Add conditional branch filtering support
This patch adds conditional branch filtering support, enabling it for PERF_SAMPLE_BRANCH_COND in perf branch stack sampling framework by utilizing an available software filter X86_BR_JCC. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index d5be06a..9723773 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -371,6 +371,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event) if (br_type & PERF_SAMPLE_BRANCH_NO_TX) mask |= X86_BR_NO_TX; + if (br_type & PERF_SAMPLE_BRANCH_COND) + mask |= X86_BR_JCC; + /* * stash actual user request into reg, it may * be used by fixup code for some CPU @@ -665,6 +668,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { * NHM/WSM erratum: must include IND_JMP to capture IND_CALL */ [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { @@ -676,6 +680,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { [PERF_SAMPLE_BRANCH_ANY_CALL] = LBR_REL_CALL | LBR_IND_CALL | LBR_FAR, [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; /* core */ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 07/10] powerpc, lib: Add new branch instruction analysis support functions
Generic powerpc branch instruction analysis support added in the code patching library which will help the subsequent patch on SW based filtering of branch records in perf. This patch also converts and exports some of the existing local static functions through the header file to be used else where. Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/code-patching.h | 30 ++ arch/powerpc/lib/code-patching.c | 54 ++-- 2 files changed, 82 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index a6f8c7a..8bab417 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -22,6 +22,36 @@ #define BRANCH_SET_LINK0x1 #define BRANCH_ABSOLUTE0x2 +#define XL_FORM_LR 0x4C20 +#define XL_FORM_CTR 0x4C000420 +#define XL_FORM_TAR 0x4C000460 + +#define BO_ALWAYS0x0280 +#define BO_CTR 0x0200 +#define BO_CRBI_OFF 0x0080 +#define BO_CRBI_ON 0x0180 +#define BO_CRBI_HINT 0x0040 + +/* Forms of branch instruction */ +int instr_is_branch_iform(unsigned int instr); +int instr_is_branch_bform(unsigned int instr); +int instr_is_branch_xlform(unsigned int instr); + +/* Classification of XL-form instruction */ +int is_xlform_lr(unsigned int instr); +int is_xlform_ctr(unsigned int instr); +int is_xlform_tar(unsigned int instr); + +/* Branch instruction is a call */ +int is_branch_link_set(unsigned int instr); + +/* BO field analysis (B-form or XL-form) */ +int is_bo_always(unsigned int instr); +int is_bo_ctr(unsigned int instr); +int is_bo_crbi_off(unsigned int instr); +int is_bo_crbi_on(unsigned int instr); +int is_bo_crbi_hint(unsigned int instr); + unsigned int create_branch(const unsigned int *addr, unsigned long target, int flags); unsigned int create_cond_branch(const unsigned int *addr, diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 17e5b23..cb62bd8 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -77,16 +77,66 @@ static unsigned int branch_opcode(unsigned int instr) return (instr >> 26) & 0x3F; } -static int instr_is_branch_iform(unsigned int instr) +int instr_is_branch_iform(unsigned int instr) { return branch_opcode(instr) == 18; } -static int instr_is_branch_bform(unsigned int instr) +int instr_is_branch_bform(unsigned int instr) { return branch_opcode(instr) == 16; } +int instr_is_branch_xlform(unsigned int instr) +{ + return branch_opcode(instr) == 19; +} + +int is_xlform_lr(unsigned int instr) +{ + return (instr & XL_FORM_LR) == XL_FORM_LR; +} + +int is_xlform_ctr(unsigned int instr) +{ + return (instr & XL_FORM_CTR) == XL_FORM_CTR; +} + +int is_xlform_tar(unsigned int instr) +{ + return (instr & XL_FORM_TAR) == XL_FORM_TAR; +} + +int is_branch_link_set(unsigned int instr) +{ + return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK; +} + +int is_bo_always(unsigned int instr) +{ + return (instr & BO_ALWAYS) == BO_ALWAYS; +} + +int is_bo_ctr(unsigned int instr) +{ + return (instr & BO_CTR) == BO_CTR; +} + +int is_bo_crbi_off(unsigned int instr) +{ + return (instr & BO_CRBI_OFF) == BO_CRBI_OFF; +} + +int is_bo_crbi_on(unsigned int instr) +{ + return (instr & BO_CRBI_ON) == BO_CRBI_ON; +} + +int is_bo_crbi_hint(unsigned int instr) +{ + return (instr & BO_CRBI_HINT) == BO_CRBI_HINT; +} + int instr_is_relative_branch(unsigned int instr) { if (instr & BRANCH_ABSOLUTE) -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 02/10] powerpc, perf: Enable conditional branch filter for POWER8
Enables conditional branch filter support for POWER8 utilizing MMCRA register based filter and also invalidates a BHRB branch filter combination involving conditional branches. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/power8-pmu.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 2ee4a70..6e28587 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -580,11 +580,21 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type) if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) return -1; + /* Invalid branch filter combination - HW does not support */ + if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) && + (branch_sample_type & PERF_SAMPLE_BRANCH_COND)) + return -1; + if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { pmu_bhrb_filter |= POWER8_MMCRA_IFM1; return pmu_bhrb_filter; } + if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) { + pmu_bhrb_filter |= POWER8_MMCRA_IFM3; + return pmu_bhrb_filter; + } + /* Every thing else is unsupported */ return -1; } -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 01/10] perf: New conditional branch filter criteria in branch stack sampling
POWER8 PMU based BHRB supports filtering for conditional branches. This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which will extend the existing perf ABI. Other architectures can provide this functionality with either HW filtering support (if present) or with SW filtering of instructions. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- include/uapi/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 0b1df41..5da52b6 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -160,8 +160,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */ PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */ + PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ - PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 10/10] powerpc, perf: Cleanup SW branch filter list look up
This patch adds enumeration for all available SW branch filters in powerpc book3s code and also streamlines the look for the SW branch filter entries while trying to figure out which all branch filters can be supported in SW. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 38 +- 1 file changed, 13 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index f983334..ec2dd61 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -566,6 +566,12 @@ static int match_filters(u64 branch_sample_type, u64 filter_mask) return true; } +/* SW implemented branch filters */ +static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_ANY_CALL, + PERF_SAMPLE_BRANCH_COND, + PERF_SAMPLE_BRANCH_ANY_RETURN, + PERF_SAMPLE_BRANCH_IND_CALL }; + /* * Required SW based branch filters * @@ -578,6 +584,7 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter, u64 *filter_mask) { u64 branch_sw_filter = 0; + unsigned int i; /* No branch filter requested */ if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) { @@ -593,34 +600,15 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter, * SW implemented filters. But right now, there is now way to * initimate the user about this decision. */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { - if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) { - branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL; - *filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL; - } - } - - if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) { - if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) { - branch_sw_filter |= PERF_SAMPLE_BRANCH_COND; - *filter_mask |= PERF_SAMPLE_BRANCH_COND; - } - } - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) { - if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) { - branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN; - *filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN; - } - } - - if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) { - if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) { - branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL; - *filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL; + for (i = 0; i < ARRAY_SIZE(power_sw_filter); i++) { + if (branch_sample_type & power_sw_filter[i]) { + if (!(pmu_bhrb_filter & power_sw_filter[i])) { + branch_sw_filter |= power_sw_filter[i]; + *filter_mask |= power_sw_filter[i]; + } } } - return branch_sw_filter; } -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 05/10] perf, documentation: Description for conditional branch filter
Adding documentation support for conditional branch filter. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- tools/perf/Documentation/perf-record.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index e297b74..59ca8d0 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -163,12 +163,13 @@ following filters are defined: - any_call: any function call or system call - any_ret: any function return or system call return - ind_call: any indirect branch +- cond: conditional branches - u: only when the branch target is at the user level - k: only when the branch target is in the kernel - hv: only when the target is at the hypervisor level + -The option requires at least one branch type among any, any_call, any_ret, ind_call. +The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. The privilege levels may be omitted, in which case, the privilege levels of the associated event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege levels are subject to permissions. When sampling on multiple events, branch stack sampling -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 00/10] perf: New conditional branch filter
This patchset is the re-spin of the original branch stack sampling patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset also enables SW based branch filtering support for book3s powerpc platforms which have PMU HW backed branch stack sampling support. Summary of code changes in this patchset: (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter (2) Add the "cond" branch filter options in the "perf record" tool (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform (5) Update the documentation regarding "perf record" tool (6) Add some new powerpc instruction analysis functions in code-patching library (7) Enable SW based branch filter support for powerpc book3s (8) Changed BHRB configuration in POWER8 to accommodate SW branch filters With this new SW enablement, the branch filter support for book3s platforms have been extended to include all these combinations discussed below with a sample test application program (included here). Changes in V2 = (1) Enabled PPC64 SW branch filtering support (2) Incorporated changes required for all previous comments Changes in V3 = (1) Split the SW branch filter enablement into multiple patches (2) Added PMU neutral SW branch filtering code, PMU specific HW branch filtering code (3) Added new instruction analysis functionality into powerpc code-patching library (4) Changed name for some of the functions (5) Fixed couple of spelling mistakes (6) Changed code documentation in multiple places PMU HW branch filters = (1) perf record -j any_call -e branch-misses:u ./cprog # Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol # ... . # 7.00%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_2 6.99%cprog cprog [.] hw_1_1 cprog [.] symbol1 6.52%cprog cprog [.] sw_3_1 cprog [.] success_3_1_2 5.41%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_3 5.40%cprog cprog [.] hw_1_2 cprog [.] symbol2 5.40%cprog cprog [.] callme cprog [.] hw_1_2 5.40%cprog cprog [.] sw_3_1 cprog [.] success_3_1_1 5.40%cprog cprog [.] callme cprog [.] hw_1_1 5.39%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_1 5.39%cprog cprog [.] sw_4_2 cprog [.] lr_addr 5.39%cprog cprog [.] callme cprog [.] sw_4_2 5.37%cprog [unknown] [.] cprog [.] ctr_addr 4.30%cprog cprog [.] callme cprog [.] hw_2_1 4.28%cprog cprog [.] callme cprog [.] sw_3_1 3.82%cprog cprog [.] sw_3_1 cprog [.] success_3_1_3 3.81%cprog cprog [.] callme cprog [.] hw_2_2 3.81%cprog cprog [.] callme cprog [.] sw_3_2 2.71%cprog [unknown] [.] cprog [.] lr_addr 2.70%cprog cprog [.] main cprog [.] callme 2.70%cprog cprog [.] sw_4_1 cprog [.] ctr_addr 2.70%cprog cprog [.] callme cprog [.] sw_4_1 0.08%cprog [unknown] [.] 0xf78676c4 [unknown] [.] 0xf78522c0 0.02%cprog [unknown] [k] cprog [k] ctr_addr 0.01%cprog [kernel.kallsyms] [.] .power_pmu_enable [kernel.kallsyms] [.] .power8_compute_mmcr 0.00%cprog ld-2.11.2.so [.] malloc [unknown] [.] 0xf786b380 0.00%cprog ld-2.11.2.so [.] calloc [unknown] [.] 0xf786b390 0.00%cprog cprog [.] main [unknown] [.] 0x1950
[V3 09/10] power8, perf: Change BHRB branch filter configuration
Powerpc kernel now supports SW based branch filters for book3s systems with some specifc requirements while dealing with HW supported branch filters in order to achieve overall OR semantics prevailing in perf branch stack sampling framework. This patch adapts the BHRB branch filter configuration to meet those protocols. POWER8 PMU does support 3 branch filters (out of which two are getting used in perf branch stack) which are mutually exclussive and cannot be ORed with each other. This implies that PMU can only handle one HW based branch filter request at any point of time. For all other combinations PMU will pass it on to the SW. Also the combination of PERF_SAMPLE_BRANCH_ANY_CALL and PERF_SAMPLE_BRANCH_COND can now be handled in SW, hence we dont error them out anymore. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/power8-pmu.c | 73 +++--- 1 file changed, 54 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 94460bc..7b82725 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -560,7 +560,56 @@ static int power8_generic_events[] = { static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask) { - u64 pmu_bhrb_filter = 0; + u64 x, tmp, pmu_bhrb_filter = 0; + *filter_mask = 0; + + /* No branch filter requested */ + if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) { + *filter_mask = PERF_SAMPLE_BRANCH_ANY; + return pmu_bhrb_filter; + } + + /* +* P8 does not support oring of PMU HW branch filters. Hence +* if multiple branch filters are requested which includes filters +* supported in PMU, still go ahead and clear the PMU based HW branch +* filter component as in this case all the filters will be processed +* in SW. +*/ + tmp = branch_sample_type; + + /* Remove privilege filters before comparison */ + tmp &= ~PERF_SAMPLE_BRANCH_USER; + tmp &= ~PERF_SAMPLE_BRANCH_KERNEL; + tmp &= ~PERF_SAMPLE_BRANCH_HV; + + for_each_branch_sample_type(x) { + /* Ignore privilege requests */ + if ((x == PERF_SAMPLE_BRANCH_USER) || (x == PERF_SAMPLE_BRANCH_KERNEL) || (x == PERF_SAMPLE_BRANCH_HV)) + continue; + + if (!(tmp & x)) + continue; + + /* Supported HW PMU filters */ + if (tmp & PERF_SAMPLE_BRANCH_ANY_CALL) { + tmp &= ~PERF_SAMPLE_BRANCH_ANY_CALL; + if (tmp) { + pmu_bhrb_filter = 0; + *filter_mask = 0; + return pmu_bhrb_filter; + } + } + + if (tmp & PERF_SAMPLE_BRANCH_COND) { + tmp &= ~PERF_SAMPLE_BRANCH_COND; + if (tmp) { + pmu_bhrb_filter = 0; + *filter_mask = 0; + return pmu_bhrb_filter; + } + } + } /* BHRB and regular PMU events share the same privilege state * filter configuration. BHRB is always recorded along with a @@ -569,34 +618,20 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask) * PMU event, we ignore any separate BHRB specific request. */ - /* No branch filter requested */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) - return pmu_bhrb_filter; - - /* Invalid branch filter options - HW does not support */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) - return -1; - - if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) - return -1; - - /* Invalid branch filter combination - HW does not support */ - if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) && - (branch_sample_type & PERF_SAMPLE_BRANCH_COND)) - return -1; - + /* Supported individual branch filters */ if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { pmu_bhrb_filter |= POWER8_MMCRA_IFM1; + *filter_mask|= PERF_SAMPLE_BRANCH_ANY_CALL; return pmu_bhrb_filter; } if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) { pmu_bhrb_filter |= POWER8_MMCRA_IFM3; + *filter_mask|= PERF_SAMPLE_BRANCH_COND; return pmu_bhrb_filter; } - /* Every thing else is unsupported */ - return -1; + return pmu_bhrb_filter; } static void power8_config_bhrb(u64 pmu_bhrb_filter) -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux
[V3 06/10] powerpc, perf: Change the name of HW PMU branch filter tracking variable
This patch simply changes the name of the variable from "bhrb_filter" to "bhrb_hw_filter" in order to add one more variable which will track SW filters in generic powerpc book3s code which will be implemented in the subsequent patch. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index eeae308..bc4dac7 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -47,7 +47,7 @@ struct cpu_hw_events { int n_txn_start; /* BHRB bits */ - u64 bhrb_filter;/* BHRB HW branch filter */ + u64 bhrb_hw_filter; /* BHRB HW branch filter */ int bhrb_users; void*bhrb_context; struct perf_branch_stack bhrb_stack; @@ -1159,7 +1159,7 @@ static void power_pmu_enable(struct pmu *pmu) out: if (cpuhw->bhrb_users) - ppmu->config_bhrb(cpuhw->bhrb_filter); + ppmu->config_bhrb(cpuhw->bhrb_hw_filter); local_irq_restore(flags); } @@ -1254,7 +1254,7 @@ nocheck: out: if (has_branch_stack(event)) { power_pmu_bhrb_enable(event); - cpuhw->bhrb_filter = ppmu->bhrb_filter_map( + cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type); } @@ -1637,10 +1637,10 @@ static int power_pmu_event_init(struct perf_event *event) err = power_check_constraints(cpuhw, events, cflags, n + 1); if (has_branch_stack(event)) { - cpuhw->bhrb_filter = ppmu->bhrb_filter_map( + cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type); - if(cpuhw->bhrb_filter == -1) + if(cpuhw->bhrb_hw_filter == -1) return -EOPNOTSUPP; } -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 03/10] perf, tool: Conditional branch filter 'cond' added to perf record
Adding perf record support for new branch stack filter criteria PERF_SAMPLE_BRANCH_COND. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- tools/perf/builtin-record.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index ecca62e..802d11d 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -625,6 +625,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL), BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN), BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL), + BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND), BRANCH_END }; -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V3 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
This patch enables SW based post processing of BHRB captured branches to be able to meet more user defined branch filtration criteria in perf branch stack sampling framework. These changes increase the number of branch filters and their valid combinations on any powerpc64 server platform with BHRB support. Find the summary of code changes here. (1) struct cpu_hw_events Introduced two new variables track various filter values and mask (a) bhrb_sw_filter Tracks SW implemented branch filter flags (b) filter_mask Tracks both (SW and HW) branch filter flags (2) Event creation Kernel will figure out supported BHRB branch filters through a PMU call back 'bhrb_filter_map'. This function will find out how many of the requested branch filters can be supported in the PMU HW. It will not try to invalidate any branch filter combinations. Event creation will not error out because of lack of HW based branch filters. Meanwhile it will track the overall supported branch filters in the "filter_mask" variable. Once the PMU call back returns kernel will process the user branch filter request against available SW filters while looking at the "filter_mask". During this phase all the branch filters which are still pending from the user requested list will have to be supported in SW failing which the event creation will error out. (3) SW branch filter During the BHRB data capture inside the PMU interrupt context, each of the captured 'perf_branch_entry.from' will be checked for compliance with applicable SW branch filters. If the entry does not conform to the filter requirements, it will be discarded from the final perf branch stack buffer. (4) Supported SW based branch filters (a) PERF_SAMPLE_BRANCH_ANY_RETURN (b) PERF_SAMPLE_BRANCH_IND_CALL (c) PERF_SAMPLE_BRANCH_ANY_CALL (d) PERF_SAMPLE_BRANCH_COND Please refer patch to understand the classification of instructions into these branch filter categories. (5) Multiple branch filter semantics Book3 sever implementation follows the same OR semantics (as implemented in x86) while dealing with multiple branch filters at any point of time. SW branch filter analysis is carried on the data set captured in the PMU HW. So the resulting set of data (after applying the SW filters) will inherently be an AND with the HW captured set. Hence any combination of HW and SW branch filters will be invalid. HW based branch filters are more efficient and faster compared to SW implemented branch filters. So at first the PMU should decide whether it can support all the requested branch filters itself or not. In case it can support all the branch filters in an OR manner, we dont apply any SW branch filter on top of the HW captured set (which is the final set). This preserves the OR semantic of multiple branch filters as required. But in case where the PMU cannot support all the requested branch filters in an OR manner, it should not apply any it's filters and leave it upto the SW to handle them all. Its the PMU code's responsibility to uphold this protocol to be able to conform to the overall OR semantic of perf branch stack sampling framework. Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/perf_event_server.h | 6 +- arch/powerpc/perf/core-book3s.c | 266 ++- arch/powerpc/perf/power8-pmu.c | 2 +- 3 files changed, 262 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index 8b24926..7314085 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -18,6 +18,10 @@ #define MAX_EVENT_ALTERNATIVES 8 #define MAX_LIMITED_HWCOUNTERS 2 +#define for_each_branch_sample_type(x) \ +for ((x) = PERF_SAMPLE_BRANCH_USER; \ + (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1) + /* * This struct provides the constants and functions needed to * describe the PMU on a particular POWER-family CPU. @@ -34,7 +38,7 @@ struct power_pmu { unsigned long *valp); int (*get_alternatives)(u64 event_id, unsigned int flags, u64 alt[]); - u64 (*bhrb_filter_map)(u64 branch_sample_type); + u64 (*bhrb_filter_map)(u64 branch_sample_type, u64 *filter_mask); void(*config_bhrb)(u64 pmu_bhrb_filter); void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]); int (*limited_pmc_event)(u64 event_id); diff --git a/arch/powerpc
Re: [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions
On 10/16/2013 01:55 PM, David Laight wrote: >> Implement instr_is_load_store_2_06() to detect whether a given instruction >> is one of the fixed-point or floating-point load/store instructions in the >> POWER Instruction Set Architecture v2.06. > ... The op code encoding is dependent on the ISA version ? Does the basic load and store instructions change with newer ISA versions ? BTW we have got a newer version for the ISA "PowerISA_V2.07_PUBLIC.pdf" here at power.org https://www.power.org/documentation/power-isa-version-2-07/ Does not sound like a good idea to analyse the instructions with functions names which specify ISA version number. Besides, this function does not belong to specific processor or platform. It has to be bit generic. >> +int instr_is_load_store_2_06(const unsigned int *instr) >> +{ >> +unsigned int op, upper, lower; >> + >> +op = instr_opcode(*instr); >> + >> +if ((op >= 32 && op <= 58) || (op == 61 || op == 62)) >> +return true; >> + >> +if (op != 31) >> +return false; >> + >> +upper = op >> 5; >> +lower = op & 0x1f; >> + >> +/* Short circuit as many misses as we can */ >> +if (lower < 3 || lower > 23) >> +return false; >> + >> +if (lower == 3) { >> +if (upper >= 16) >> +return true; >> + >> +return false; >> +} >> + >> +if (lower == 7 || lower == 12) >> +return true; >> + >> +if (lower >= 20) /* && lower <= 23 (implicit) */ >> +return true; >> + >> +return false; >> +} > > I can't help feeling the code could do with some comments about > which actual instructions are selected where. Yeah, I agree. At least which category of load-store instructions are getting selected in each case. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V3 01/10] perf: New conditional branch filter criteria in branch stack sampling
On 11/26/2013 03:45 PM, Anshuman Khandual wrote: > On 11/26/2013 11:36 AM, m...@ellerman.id.au wrote: >> Ideally your commit subject would contain a verb, preferably in the present >> tense. >> >> I think simply "perf: Add PERF_SAMPLE_BRANCH_COND" would be clearer. > > > Sure, will change it. > >> >> On Wed, 2013-16-10 at 06:56:48 UTC, Anshuman Khandual wrote: >>> POWER8 PMU based BHRB supports filtering for conditional branches. >>> This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which >>> will extend the existing perf ABI. Other architectures can provide >>> this functionality with either HW filtering support (if present) or >>> with SW filtering of instructions. >>> >>> Signed-off-by: Anshuman Khandual >>> Reviewed-by: Stephane Eranian >>> --- >>> include/uapi/linux/perf_event.h | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/uapi/linux/perf_event.h >>> b/include/uapi/linux/perf_event.h >>> index 0b1df41..5da52b6 100644 >>> --- a/include/uapi/linux/perf_event.h >>> +++ b/include/uapi/linux/perf_event.h >>> @@ -160,8 +160,9 @@ enum perf_branch_sample_type { >>> PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ >>> PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */ >>> PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */ >>> + PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ >>> >>> - PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */ >>> + PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ >>> }; >> >> This no longer applies against Linus' tree, you'll need to rebase it. > > Okay Hey Michael, Looks like the patch still applies on top of Linus's tree. The modified patch with a new commit subject line can be found here. -- >From d368096fc51a8da65f2d80ed5090d43cbc269f62 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 22 Jul 2013 12:22:27 +0530 Subject: [PATCH] perf: Add PERF_SAMPLE_BRANCH_COND POWER8 PMU based BHRB supports filtering for conditional branches. This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which will extend the existing perf ABI. Other architectures can provide this functionality with either HW filtering support (if present) or with SW filtering of instructions. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- include/uapi/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index e1802d6..e2d8b8b 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -163,8 +163,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */ PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */ + PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ - PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V4 05/10] perf, documentation: Description for conditional branch filter
Adding documentation support for conditional branch filter. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- tools/perf/Documentation/perf-record.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 43b42c4..5ecc405 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -183,9 +183,10 @@ following filters are defined: - in_tx: only when the target is in a hardware transaction - no_tx: only when the target is not in a hardware transaction - abort_tx: only when the target is a hardware transaction abort + - cond: conditional branches + -The option requires at least one branch type among any, any_call, any_ret, ind_call. +The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. The privilege levels may be omitted, in which case, the privilege levels of the associated event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege levels are subject to permissions. When sampling on multiple events, branch stack sampling -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V4 01/10] perf: Add PERF_SAMPLE_BRANCH_COND
POWER8 PMU based BHRB supports filtering for conditional branches. This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which will extend the existing perf ABI. Other architectures can provide this functionality with either HW filtering support (if present) or with SW filtering of instructions. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- include/uapi/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index e1802d6..e2d8b8b 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -163,8 +163,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */ PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */ + PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ - PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V4 10/10] powerpc, perf: Cleanup SW branch filter list look up
This patch adds enumeration for all available SW branch filters in powerpc book3s code and also streamlines the look for the SW branch filter entries while trying to figure out which all branch filters can be supported in SW. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 38 +- 1 file changed, 13 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 54d39a5..42c6428 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -566,6 +566,12 @@ static int match_filters(u64 branch_sample_type, u64 filter_mask) return true; } +/* SW implemented branch filters */ +static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_ANY_CALL, + PERF_SAMPLE_BRANCH_COND, + PERF_SAMPLE_BRANCH_ANY_RETURN, + PERF_SAMPLE_BRANCH_IND_CALL }; + /* * Required SW based branch filters * @@ -578,6 +584,7 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter, u64 *filter_mask) { u64 branch_sw_filter = 0; + unsigned int i; /* No branch filter requested */ if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) { @@ -593,34 +600,15 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter, * SW implemented filters. But right now, there is now way to * initimate the user about this decision. */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { - if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) { - branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL; - *filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL; - } - } - - if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) { - if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) { - branch_sw_filter |= PERF_SAMPLE_BRANCH_COND; - *filter_mask |= PERF_SAMPLE_BRANCH_COND; - } - } - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) { - if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) { - branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN; - *filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN; - } - } - - if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) { - if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) { - branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL; - *filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL; + for (i = 0; i < ARRAY_SIZE(power_sw_filter); i++) { + if (branch_sample_type & power_sw_filter[i]) { + if (!(pmu_bhrb_filter & power_sw_filter[i])) { + branch_sw_filter |= power_sw_filter[i]; + *filter_mask |= power_sw_filter[i]; + } } } - return branch_sw_filter; } -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V4 00/10] perf: New conditional branch filter
This patchset is the re-spin of the original branch stack sampling patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset also enables SW based branch filtering support for book3s powerpc platforms which have PMU HW backed branch stack sampling support. Summary of code changes in this patchset: (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter (2) Add the "cond" branch filter options in the "perf record" tool (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform (5) Update the documentation regarding "perf record" tool (6) Add some new powerpc instruction analysis functions in code-patching library (7) Enable SW based branch filter support for powerpc book3s (8) Changed BHRB configuration in POWER8 to accommodate SW branch filters With this new SW enablement, the branch filter support for book3s platforms have been extended to include all these combinations discussed below with a sample test application program (included here). Changes in V2 = (1) Enabled PPC64 SW branch filtering support (2) Incorporated changes required for all previous comments Changes in V3 = (1) Split the SW branch filter enablement into multiple patches (2) Added PMU neutral SW branch filtering code, PMU specific HW branch filtering code (3) Added new instruction analysis functionality into powerpc code-patching library (4) Changed name for some of the functions (5) Fixed couple of spelling mistakes (6) Changed code documentation in multiple places Changes in V4 = (1) Changed the commit message for patch (01/10) (2) Changed the patch (02/10) to accommodate review comments from Michael Ellerman (3) Rebased the patchset against latest Linus's tree PMU HW branch filters = (1) perf record -j any_call -e branch-misses:u ./cprog # Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol # ... . # 7.00%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_2 6.99%cprog cprog [.] hw_1_1 cprog [.] symbol1 6.52%cprog cprog [.] sw_3_1 cprog [.] success_3_1_2 5.41%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_3 5.40%cprog cprog [.] hw_1_2 cprog [.] symbol2 5.40%cprog cprog [.] callme cprog [.] hw_1_2 5.40%cprog cprog [.] sw_3_1 cprog [.] success_3_1_1 5.40%cprog cprog [.] callme cprog [.] hw_1_1 5.39%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_1 5.39%cprog cprog [.] sw_4_2 cprog [.] lr_addr 5.39%cprog cprog [.] callme cprog [.] sw_4_2 5.37%cprog [unknown] [.] cprog [.] ctr_addr 4.30%cprog cprog [.] callme cprog [.] hw_2_1 4.28%cprog cprog [.] callme cprog [.] sw_3_1 3.82%cprog cprog [.] sw_3_1 cprog [.] success_3_1_3 3.81%cprog cprog [.] callme cprog [.] hw_2_2 3.81%cprog cprog [.] callme cprog [.] sw_3_2 2.71%cprog [unknown] [.] cprog [.] lr_addr 2.70%cprog cprog [.] main cprog [.] callme 2.70%cprog cprog [.] sw_4_1 cprog [.] ctr_addr 2.70%cprog cprog [.] callme cprog [.] sw_4_1 0.08%cprog [unknown] [.] 0xf78676c4 [unknown] [.] 0xf78522c0 0.02%cprog [unknown] [k] cprog [k] ctr_addr 0.01%cprog [kernel.kallsyms] [.] .power_pmu_enable [kernel.kallsyms] [.] .power8_compute_mmcr 0.00%cprog ld-2.11.2.so [.] malloc [unknown] [.] 0xf786b380 0.00%cp
[PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
This patch enables SW based post processing of BHRB captured branches to be able to meet more user defined branch filtration criteria in perf branch stack sampling framework. These changes increase the number of branch filters and their valid combinations on any powerpc64 server platform with BHRB support. Find the summary of code changes here. (1) struct cpu_hw_events Introduced two new variables track various filter values and mask (a) bhrb_sw_filter Tracks SW implemented branch filter flags (b) filter_mask Tracks both (SW and HW) branch filter flags (2) Event creation Kernel will figure out supported BHRB branch filters through a PMU call back 'bhrb_filter_map'. This function will find out how many of the requested branch filters can be supported in the PMU HW. It will not try to invalidate any branch filter combinations. Event creation will not error out because of lack of HW based branch filters. Meanwhile it will track the overall supported branch filters in the "filter_mask" variable. Once the PMU call back returns kernel will process the user branch filter request against available SW filters while looking at the "filter_mask". During this phase all the branch filters which are still pending from the user requested list will have to be supported in SW failing which the event creation will error out. (3) SW branch filter During the BHRB data capture inside the PMU interrupt context, each of the captured 'perf_branch_entry.from' will be checked for compliance with applicable SW branch filters. If the entry does not conform to the filter requirements, it will be discarded from the final perf branch stack buffer. (4) Supported SW based branch filters (a) PERF_SAMPLE_BRANCH_ANY_RETURN (b) PERF_SAMPLE_BRANCH_IND_CALL (c) PERF_SAMPLE_BRANCH_ANY_CALL (d) PERF_SAMPLE_BRANCH_COND Please refer patch to understand the classification of instructions into these branch filter categories. (5) Multiple branch filter semantics Book3 sever implementation follows the same OR semantics (as implemented in x86) while dealing with multiple branch filters at any point of time. SW branch filter analysis is carried on the data set captured in the PMU HW. So the resulting set of data (after applying the SW filters) will inherently be an AND with the HW captured set. Hence any combination of HW and SW branch filters will be invalid. HW based branch filters are more efficient and faster compared to SW implemented branch filters. So at first the PMU should decide whether it can support all the requested branch filters itself or not. In case it can support all the branch filters in an OR manner, we dont apply any SW branch filter on top of the HW captured set (which is the final set). This preserves the OR semantic of multiple branch filters as required. But in case where the PMU cannot support all the requested branch filters in an OR manner, it should not apply any it's filters and leave it upto the SW to handle them all. Its the PMU code's responsibility to uphold this protocol to be able to conform to the overall OR semantic of perf branch stack sampling framework. Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/perf_event_server.h | 6 +- arch/powerpc/perf/core-book3s.c | 266 ++- arch/powerpc/perf/power8-pmu.c | 2 +- 3 files changed, 262 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index 3fd2f1b..846d710 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -18,6 +18,10 @@ #define MAX_EVENT_ALTERNATIVES 8 #define MAX_LIMITED_HWCOUNTERS 2 +#define for_each_branch_sample_type(x) \ +for ((x) = PERF_SAMPLE_BRANCH_USER; \ + (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1) + /* * This struct provides the constants and functions needed to * describe the PMU on a particular POWER-family CPU. @@ -34,7 +38,7 @@ struct power_pmu { unsigned long *valp); int (*get_alternatives)(u64 event_id, unsigned int flags, u64 alt[]); - u64 (*bhrb_filter_map)(u64 branch_sample_type); + u64 (*bhrb_filter_map)(u64 branch_sample_type, u64 *filter_mask); void(*config_bhrb)(u64 pmu_bhrb_filter); void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]); int (*limited_pmc_event)(u64 event_id); diff --git a/arch/powerpc
[PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
Powerpc kernel now supports SW based branch filters for book3s systems with some specifc requirements while dealing with HW supported branch filters in order to achieve overall OR semantics prevailing in perf branch stack sampling framework. This patch adapts the BHRB branch filter configuration to meet those protocols. POWER8 PMU does support 3 branch filters (out of which two are getting used in perf branch stack) which are mutually exclussive and cannot be ORed with each other. This implies that PMU can only handle one HW based branch filter request at any point of time. For all other combinations PMU will pass it on to the SW. Also the combination of PERF_SAMPLE_BRANCH_ANY_CALL and PERF_SAMPLE_BRANCH_COND can now be handled in SW, hence we dont error them out anymore. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/power8-pmu.c | 73 +++--- 1 file changed, 54 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 03c5b8d..6021349 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -561,7 +561,56 @@ static int power8_generic_events[] = { static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask) { - u64 pmu_bhrb_filter = 0; + u64 x, tmp, pmu_bhrb_filter = 0; + *filter_mask = 0; + + /* No branch filter requested */ + if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) { + *filter_mask = PERF_SAMPLE_BRANCH_ANY; + return pmu_bhrb_filter; + } + + /* +* P8 does not support oring of PMU HW branch filters. Hence +* if multiple branch filters are requested which includes filters +* supported in PMU, still go ahead and clear the PMU based HW branch +* filter component as in this case all the filters will be processed +* in SW. +*/ + tmp = branch_sample_type; + + /* Remove privilege filters before comparison */ + tmp &= ~PERF_SAMPLE_BRANCH_USER; + tmp &= ~PERF_SAMPLE_BRANCH_KERNEL; + tmp &= ~PERF_SAMPLE_BRANCH_HV; + + for_each_branch_sample_type(x) { + /* Ignore privilege requests */ + if ((x == PERF_SAMPLE_BRANCH_USER) || (x == PERF_SAMPLE_BRANCH_KERNEL) || (x == PERF_SAMPLE_BRANCH_HV)) + continue; + + if (!(tmp & x)) + continue; + + /* Supported HW PMU filters */ + if (tmp & PERF_SAMPLE_BRANCH_ANY_CALL) { + tmp &= ~PERF_SAMPLE_BRANCH_ANY_CALL; + if (tmp) { + pmu_bhrb_filter = 0; + *filter_mask = 0; + return pmu_bhrb_filter; + } + } + + if (tmp & PERF_SAMPLE_BRANCH_COND) { + tmp &= ~PERF_SAMPLE_BRANCH_COND; + if (tmp) { + pmu_bhrb_filter = 0; + *filter_mask = 0; + return pmu_bhrb_filter; + } + } + } /* BHRB and regular PMU events share the same privilege state * filter configuration. BHRB is always recorded along with a @@ -570,34 +619,20 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask) * PMU event, we ignore any separate BHRB specific request. */ - /* No branch filter requested */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) - return pmu_bhrb_filter; - - /* Invalid branch filter options - HW does not support */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) - return -1; - - if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) - return -1; - + /* Supported individual branch filters */ if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { pmu_bhrb_filter |= POWER8_MMCRA_IFM1; + *filter_mask|= PERF_SAMPLE_BRANCH_ANY_CALL; return pmu_bhrb_filter; } if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) { pmu_bhrb_filter |= POWER8_MMCRA_IFM3; + *filter_mask|= PERF_SAMPLE_BRANCH_COND; return pmu_bhrb_filter; } - /* PMU does not support ANY combination of HW BHRB filters */ - if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) && - (branch_sample_type & PERF_SAMPLE_BRANCH_COND)) - return -1; - - /* Every thing else is unsupported */ - return -1; + return pmu_bhrb_filter; } static void power8_config_bhrb(u64 pmu_bhrb_filter) -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux
[PATCH V4 06/10] powerpc, perf: Change the name of HW PMU branch filter tracking variable
This patch simply changes the name of the variable from "bhrb_filter" to "bhrb_hw_filter" in order to add one more variable which will track SW filters in generic powerpc book3s code which will be implemented in the subsequent patch. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 29b89e8..2de7d48 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -47,7 +47,7 @@ struct cpu_hw_events { int n_txn_start; /* BHRB bits */ - u64 bhrb_filter;/* BHRB HW branch filter */ + u64 bhrb_hw_filter; /* BHRB HW branch filter */ int bhrb_users; void*bhrb_context; struct perf_branch_stack bhrb_stack; @@ -1159,7 +1159,7 @@ static void power_pmu_enable(struct pmu *pmu) out: if (cpuhw->bhrb_users) - ppmu->config_bhrb(cpuhw->bhrb_filter); + ppmu->config_bhrb(cpuhw->bhrb_hw_filter); local_irq_restore(flags); } @@ -1254,7 +1254,7 @@ nocheck: out: if (has_branch_stack(event)) { power_pmu_bhrb_enable(event); - cpuhw->bhrb_filter = ppmu->bhrb_filter_map( + cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type); } @@ -1637,10 +1637,10 @@ static int power_pmu_event_init(struct perf_event *event) err = power_check_constraints(cpuhw, events, cflags, n + 1); if (has_branch_stack(event)) { - cpuhw->bhrb_filter = ppmu->bhrb_filter_map( + cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type); - if(cpuhw->bhrb_filter == -1) + if(cpuhw->bhrb_hw_filter == -1) return -EOPNOTSUPP; } -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V4 07/10] powerpc, lib: Add new branch instruction analysis support functions
Generic powerpc branch instruction analysis support added in the code patching library which will help the subsequent patch on SW based filtering of branch records in perf. This patch also converts and exports some of the existing local static functions through the header file to be used else where. Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/code-patching.h | 30 ++ arch/powerpc/lib/code-patching.c | 54 ++-- 2 files changed, 82 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index a6f8c7a..8bab417 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -22,6 +22,36 @@ #define BRANCH_SET_LINK0x1 #define BRANCH_ABSOLUTE0x2 +#define XL_FORM_LR 0x4C20 +#define XL_FORM_CTR 0x4C000420 +#define XL_FORM_TAR 0x4C000460 + +#define BO_ALWAYS0x0280 +#define BO_CTR 0x0200 +#define BO_CRBI_OFF 0x0080 +#define BO_CRBI_ON 0x0180 +#define BO_CRBI_HINT 0x0040 + +/* Forms of branch instruction */ +int instr_is_branch_iform(unsigned int instr); +int instr_is_branch_bform(unsigned int instr); +int instr_is_branch_xlform(unsigned int instr); + +/* Classification of XL-form instruction */ +int is_xlform_lr(unsigned int instr); +int is_xlform_ctr(unsigned int instr); +int is_xlform_tar(unsigned int instr); + +/* Branch instruction is a call */ +int is_branch_link_set(unsigned int instr); + +/* BO field analysis (B-form or XL-form) */ +int is_bo_always(unsigned int instr); +int is_bo_ctr(unsigned int instr); +int is_bo_crbi_off(unsigned int instr); +int is_bo_crbi_on(unsigned int instr); +int is_bo_crbi_hint(unsigned int instr); + unsigned int create_branch(const unsigned int *addr, unsigned long target, int flags); unsigned int create_cond_branch(const unsigned int *addr, diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 17e5b23..cb62bd8 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -77,16 +77,66 @@ static unsigned int branch_opcode(unsigned int instr) return (instr >> 26) & 0x3F; } -static int instr_is_branch_iform(unsigned int instr) +int instr_is_branch_iform(unsigned int instr) { return branch_opcode(instr) == 18; } -static int instr_is_branch_bform(unsigned int instr) +int instr_is_branch_bform(unsigned int instr) { return branch_opcode(instr) == 16; } +int instr_is_branch_xlform(unsigned int instr) +{ + return branch_opcode(instr) == 19; +} + +int is_xlform_lr(unsigned int instr) +{ + return (instr & XL_FORM_LR) == XL_FORM_LR; +} + +int is_xlform_ctr(unsigned int instr) +{ + return (instr & XL_FORM_CTR) == XL_FORM_CTR; +} + +int is_xlform_tar(unsigned int instr) +{ + return (instr & XL_FORM_TAR) == XL_FORM_TAR; +} + +int is_branch_link_set(unsigned int instr) +{ + return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK; +} + +int is_bo_always(unsigned int instr) +{ + return (instr & BO_ALWAYS) == BO_ALWAYS; +} + +int is_bo_ctr(unsigned int instr) +{ + return (instr & BO_CTR) == BO_CTR; +} + +int is_bo_crbi_off(unsigned int instr) +{ + return (instr & BO_CRBI_OFF) == BO_CRBI_OFF; +} + +int is_bo_crbi_on(unsigned int instr) +{ + return (instr & BO_CRBI_ON) == BO_CRBI_ON; +} + +int is_bo_crbi_hint(unsigned int instr) +{ + return (instr & BO_CRBI_HINT) == BO_CRBI_HINT; +} + int instr_is_relative_branch(unsigned int instr) { if (instr & BRANCH_ABSOLUTE) -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V4 02/10] powerpc, perf: Enable conditional branch filter for POWER8
Enables conditional branch filter support for POWER8 utilizing MMCRA register based filter and also invalidates any BHRB branch filter combination. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/power8-pmu.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index a3f7abd..e88b9cb 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -586,6 +586,16 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type) return pmu_bhrb_filter; } + if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) { + pmu_bhrb_filter |= POWER8_MMCRA_IFM3; + return pmu_bhrb_filter; + } + + /* PMU does not support ANY combination of HW BHRB filters */ + if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) && + (branch_sample_type & PERF_SAMPLE_BRANCH_COND)) + return -1; + /* Every thing else is unsupported */ return -1; } -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V4 03/10] perf, tool: Conditional branch filter 'cond' added to perf record
Adding perf record support for new branch stack filter criteria PERF_SAMPLE_BRANCH_COND. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- tools/perf/builtin-record.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 7c8020a..34040f7 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX), BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX), BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX), + BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND), BRANCH_END }; -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V4 04/10] x86, perf: Add conditional branch filtering support
This patch adds conditional branch filtering support, enabling it for PERF_SAMPLE_BRANCH_COND in perf branch stack sampling framework by utilizing an available software filter X86_BR_JCC. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index d82d155..9dd2459 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event) if (br_type & PERF_SAMPLE_BRANCH_NO_TX) mask |= X86_BR_NO_TX; + if (br_type & PERF_SAMPLE_BRANCH_COND) + mask |= X86_BR_JCC; + /* * stash actual user request into reg, it may * be used by fixup code for some CPU @@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { * NHM/WSM erratum: must include IND_JMP to capture IND_CALL */ [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { @@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { [PERF_SAMPLE_BRANCH_ANY_CALL] = LBR_REL_CALL | LBR_IND_CALL | LBR_FAR, [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; /* core */ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
On 12/10/2013 11:27 AM, Anshuman Khandual wrote: > On 12/09/2013 11:51 AM, Michael Ellerman wrote: >> This code was already in need of some unindentation, and now it's just >> ridiculous. >> >> To start with at the beginning of this routine we have: >> >> while (..) { >> if (!val) >> break; >> else { >> // Bulk of the logic >> ... >> } >> } >> >> That should almost always become: >> >> while (..) { >> if (!val) >> break; >> >> // Bulk of the logic >> ... >> } >> >> >> But in this case that's not enough. Please send a precursor patch which moves >> this logic out into a helper function. > > Hey Michael, > > I believe this patch should be able to take care of this. > > commit d66d729715cabe0cfd8e34861a6afa8ad639ddf3 > Author: Anshuman Khandual > Date: Tue Dec 10 11:10:06 2013 +0530 > > power, perf: Clean up BHRB processing > > This patch cleans up some indentation problem and re-organizes the > BHRB processing code with an additional helper function. > > Signed-off-by: Anshuman Khandual > > diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c > index 29b89e8..9ae96c5 100644 > --- a/arch/powerpc/perf/core-book3s.c > +++ b/arch/powerpc/perf/core-book3s.c > @@ -400,11 +400,20 @@ static __u64 power_pmu_bhrb_to(u64 addr) > return target - (unsigned long)&instr + addr; > } > > +void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, > u64 to, int pred) > +{ > + cpuhw->bhrb_entries[u_index].from = from; > + cpuhw->bhrb_entries[u_index].to = to; > + cpuhw->bhrb_entries[u_index].mispred = pred; > + cpuhw->bhrb_entries[u_index].predicted = ~pred; > + return; > +} > + > /* Processing BHRB entries */ > void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) > { > u64 val; > - u64 addr; > + u64 addr, tmp; > int r_index, u_index, pred; > > r_index = 0; > @@ -415,62 +424,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) > if (!val) > /* Terminal marker: End of valid BHRB entries */ > break; > - else { > - addr = val & BHRB_EA; > - pred = val & BHRB_PREDICTION; > > - if (!addr) > - /* invalid entry */ > - continue; > + addr = val & BHRB_EA; > + pred = val & BHRB_PREDICTION; > > - /* Branches are read most recent first (ie. mfbhrb 0 is > - * the most recent branch). > - * There are two types of valid entries: > - * 1) a target entry which is the to address of a > - *computed goto like a blr,bctr,btar. The next > - *entry read from the bhrb will be branch > - *corresponding to this target (ie. the actual > - *blr/bctr/btar instruction). > - * 2) a from address which is an actual branch. If a > - *target entry proceeds this, then this is the > - *matching branch for that target. If this is not > - *following a target entry, then this is a branch > - *where the target is given as an immediate field > - *in the instruction (ie. an i or b form branch). > - *In this case we need to read the instruction from > - *memory to determine the target/to address. > + if (!addr) > + /* invalid entry */ > + continue; > + > + /* Branches are read most recent first (ie. mfbhrb 0 is > + * the most recent branch). > + * There are two types of valid entries: > + * 1) a target entry which is the to address of a > + *computed goto like a blr,bctr,btar. The next > + *entry read from the bhrb will be branch > + *corresponding to this target (ie. the actual > + *blr/bctr/btar instruction). > + * 2) a from address which is an actual branch. If a > + *target entry proceeds this, then this is the > + *matching branch for that target. If this is
Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
On 12/09/2013 11:51 AM, Michael Ellerman wrote: > > As I said in my comments on version 3 which you ignored: > > I think it would be clearer if we actually checked for the possibilities > we > allow and let everything else fall through, eg: > > Â Â Â Â Â Â Â Â /* Ignore user/kernel/hv bits */ > Â Â Â Â Â Â Â Â branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL; > > Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY) > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return 0; > > Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM1; > Â > Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM3; > Â Â Â Â Â Â Â Â > Â Â Â Â Â Â Â Â return -1; > Hey Michael, This patch only adds support for the PERF_SAMPLE_BRANCH_COND filter, if the over all code flow does not clearly suggest that all combinations of any of these HW filters are invalid, then we can go with one more patch to clean that up before or after this patch but not here in this patch. Finally the code section here will look something like this. Does it sound good ? static u64 power8_bhrb_filter_map(u64 branch_sample_type) { u64 pmu_bhrb_filter = 0; /* BHRB and regular PMU events share the same privilege state * filter configuration. BHRB is always recorded along with a * regular PMU event. As the privilege state filter is handled * in the basic PMC configuration of the accompanying regular * PMU event, we ignore any separate BHRB specific request. */ /* Ignore user, kernel, hv bits */ branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL; if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY) return pmu_bhrb_filter; if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) { pmu_bhrb_filter |= POWER8_MMCRA_IFM1; return pmu_bhrb_filter; } if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) { pmu_bhrb_filter |= POWER8_MMCRA_IFM3; return pmu_bhrb_filter; } /* Every thing else is unsupported */ return -1; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes
On 07/18/2014 04:53 AM, Sam Bobroff wrote: > On 17/07/14 21:14, Michael Neuling wrote: >> >> On Jul 17, 2014 9:11 PM, "Benjamin Herrenschmidt" >> mailto:b...@kernel.crashing.org>> wrote: >>> > >> Outstanding Issues >> == >> (1) Running DSCR register value inside a transaction does not >> seem to be saved >> at thread.dscr when the process stops for ptrace examination. > > Hey Ben, > > Any updates on this patch series ? Ben, Any updates on this patch series ? >>> >>> I haven't had a chance to review yet, I was hoping somebody else would.. >>> >>> Have you made any progress vs. the DSCR outstanding issue mentioned >>> above ? >> >> The DSCR issue should be resolved with Sam Bobroff's recent DSCR >> fixes. I've not tested them though. >> >> Actually... Sam did you review this series? >> >> Mikey >> > > I did, and applying "powerpc: Correct DSCR during TM context switch" > corrected the DSCR value in the test program (the one in the patch notes > for this series). > > (In fact, IIRC, the reason for my patch set was the bug exposed by this > one ;-) Yeah the test program worked correctly with the fix from Sam. The first patch is a generic code change which Pedro had reviewed before. The second and third patches are powerpc specific. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes
On 05/23/2014 08:45 PM, Anshuman Khandual wrote: > This patch series adds five new ELF core note sections which can be > used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing > various transactional memory and miscellaneous register sets on PowerPC > platform. Please find a test program exploiting these new ELF core note > types on a POWER8 system. > > RFC: https://lkml.org/lkml/2014/4/1/292 > V1: https://lkml.org/lkml/2014/4/2/43 > V2: https://lkml.org/lkml/2014/5/5/88 > > Changes in V3 > = > (1) Added two new error paths in every TM related get/set functions when > regset > support is not present on the system (ENODEV) or when the process does not > have any transaction active (ENODATA) in the context > > (2) Installed the active hooks for all the newly added regset core note types > > Changes in V2 > = > (1) Removed all the power specific ptrace requests corresponding to new > NT_PPC_* > elf core note types. Now all the register sets can be accessed from ptrace > through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* > core > note type instead > (2) Fixed couple of attribute values for REGSET_TM_CGPR register set > (3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread > (4) Fixed 32 bit checkpointed GPR support > (5) Changed commit messages accordingly > > Outstanding Issues > == > (1) Running DSCR register value inside a transaction does not seem to be saved > at thread.dscr when the process stops for ptrace examination. Hey Sam and Suka, Thanks for reviewing this patch series. I was busy with some other work for last couple of months. Went through your comments, will get back to this patch series in some time and work on the comments. Thanks again. Regards Anshuman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC/RFT v3 6/9] powerpc: move cacheinfo sysfs to generic cacheinfo infrastructure
On 02/19/2014 09:36 PM, Sudeep Holla wrote: > From: Sudeep Holla > > This patch removes the redundant sysfs cacheinfo code by making use of > the newly introduced generic cacheinfo infrastructure. > > Signed-off-by: Sudeep Holla > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: linuxppc-...@lists.ozlabs.org > --- > arch/powerpc/kernel/cacheinfo.c | 831 > ++-- > arch/powerpc/kernel/cacheinfo.h | 8 - > arch/powerpc/kernel/sysfs.c | 4 - > 3 files changed, 109 insertions(+), 734 deletions(-) > delete mode 100644 arch/powerpc/kernel/cacheinfo.h > > diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c > index 2912b87..05b7580 100644 > --- a/arch/powerpc/kernel/cacheinfo.c > +++ b/arch/powerpc/kernel/cacheinfo.c > @@ -10,38 +10,10 @@ > * 2 as published by the Free Software Foundation. > */ > > +#include > #include > -#include > #include > -#include > -#include > -#include > #include > -#include > -#include > -#include > - > -#include "cacheinfo.h" > - > -/* per-cpu object for tracking: > - * - a "cache" kobject for the top-level directory > - * - a list of "index" objects representing the cpu's local cache hierarchy > - */ > -struct cache_dir { > - struct kobject *kobj; /* bare (not embedded) kobject for cache > -* directory */ > - struct cache_index_dir *index; /* list of index objects */ > -}; > - > -/* "index" object: each cpu's cache directory has an index > - * subdirectory corresponding to a cache object associated with the > - * cpu. This object's lifetime is managed via the embedded kobject. > - */ > -struct cache_index_dir { > - struct kobject kobj; > - struct cache_index_dir *next; /* next index in parent directory */ > - struct cache *cache; > -}; > > /* Template for determining which OF properties to query for a given > * cache type */ > @@ -60,11 +32,6 @@ struct cache_type_info { > const char *nr_sets_prop; > }; > > -/* These are used to index the cache_type_info array. */ > -#define CACHE_TYPE_UNIFIED 0 > -#define CACHE_TYPE_INSTRUCTION 1 > -#define CACHE_TYPE_DATA2 > - > static const struct cache_type_info cache_type_info[] = { > { > /* PowerPC Processor binding says the [di]-cache-* > @@ -77,246 +44,115 @@ static const struct cache_type_info cache_type_info[] = > { > .nr_sets_prop= "d-cache-sets", > }, > { > - .name= "Instruction", > - .size_prop = "i-cache-size", > - .line_size_props = { "i-cache-line-size", > - "i-cache-block-size", }, > - .nr_sets_prop= "i-cache-sets", > - }, > - { > .name= "Data", > .size_prop = "d-cache-size", > .line_size_props = { "d-cache-line-size", >"d-cache-block-size", }, > .nr_sets_prop= "d-cache-sets", > }, > + { > + .name= "Instruction", > + .size_prop = "i-cache-size", > + .line_size_props = { "i-cache-line-size", > + "i-cache-block-size", }, > + .nr_sets_prop= "i-cache-sets", > + }, > }; Hey Sudeep, After applying this patch, the cache_type_info array looks like this. static const struct cache_type_info cache_type_info[] = { { /* * PowerPC Processor binding says the [di]-cache-* * must be equal on unified caches, so just use * d-cache properties. */ .name= "Unified", .size_prop = "d-cache-size", .line_size_props = { "d-cache-line-size", "d-cache-block-size", }, .nr_sets_prop= "d-cache-sets", }, { .name= "Data", .size_prop = "d-cache-size", .line_size_props = { "d-cache-line-size", "d-cache-block-size", }, .nr_sets_prop= "d-cache-sets", }, { .name= "Instruction", .size_prop = "i-cache-size", .line_size_props = { "i-cache-line-size", "i-cache-block-size", }, .nr_sets_prop= "i-cache-sets", }, }; and this function computes the the array index for any given cache type define for PowerPC. static inline int get_cacheinfo_idx(enum cache_type type) { if (type == CACHE_TYPE_UNIFIED) return 0; else return type; } These types are define in include/linux/cacheinfo.h as enum cache_type { CACHE_TYPE_NOCACHE = 0, CACHE_TYPE_INST = BIT(0),
Re: [PATCH RFC/RFT v3 6/9] powerpc: move cacheinfo sysfs to generic cacheinfo infrastructure
On 03/07/2014 09:36 AM, Anshuman Khandual wrote: > On 02/19/2014 09:36 PM, Sudeep Holla wrote: >> From: Sudeep Holla >> >> This patch removes the redundant sysfs cacheinfo code by making use of >> the newly introduced generic cacheinfo infrastructure. >> >> Signed-off-by: Sudeep Holla >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: linuxppc-...@lists.ozlabs.org >> --- >> arch/powerpc/kernel/cacheinfo.c | 831 >> ++-- >> arch/powerpc/kernel/cacheinfo.h | 8 - >> arch/powerpc/kernel/sysfs.c | 4 - >> 3 files changed, 109 insertions(+), 734 deletions(-) >> delete mode 100644 arch/powerpc/kernel/cacheinfo.h >> >> diff --git a/arch/powerpc/kernel/cacheinfo.c >> b/arch/powerpc/kernel/cacheinfo.c >> index 2912b87..05b7580 100644 >> --- a/arch/powerpc/kernel/cacheinfo.c >> +++ b/arch/powerpc/kernel/cacheinfo.c >> @@ -10,38 +10,10 @@ >> * 2 as published by the Free Software Foundation. >> */ >> >> +#include >> #include >> -#include >> #include >> -#include >> -#include >> -#include >> #include >> -#include >> -#include >> -#include >> - >> -#include "cacheinfo.h" >> - >> -/* per-cpu object for tracking: >> - * - a "cache" kobject for the top-level directory >> - * - a list of "index" objects representing the cpu's local cache hierarchy >> - */ >> -struct cache_dir { >> -struct kobject *kobj; /* bare (not embedded) kobject for cache >> - * directory */ >> -struct cache_index_dir *index; /* list of index objects */ >> -}; >> - >> -/* "index" object: each cpu's cache directory has an index >> - * subdirectory corresponding to a cache object associated with the >> - * cpu. This object's lifetime is managed via the embedded kobject. >> - */ >> -struct cache_index_dir { >> -struct kobject kobj; >> -struct cache_index_dir *next; /* next index in parent directory */ >> -struct cache *cache; >> -}; >> >> /* Template for determining which OF properties to query for a given >> * cache type */ >> @@ -60,11 +32,6 @@ struct cache_type_info { >> const char *nr_sets_prop; >> }; >> >> -/* These are used to index the cache_type_info array. */ >> -#define CACHE_TYPE_UNIFIED 0 >> -#define CACHE_TYPE_INSTRUCTION 1 >> -#define CACHE_TYPE_DATA2 >> - >> static const struct cache_type_info cache_type_info[] = { >> { >> /* PowerPC Processor binding says the [di]-cache-* >> @@ -77,246 +44,115 @@ static const struct cache_type_info cache_type_info[] >> = { >> .nr_sets_prop= "d-cache-sets", >> }, >> { >> -.name= "Instruction", >> -.size_prop = "i-cache-size", >> -.line_size_props = { "i-cache-line-size", >> - "i-cache-block-size", }, >> -.nr_sets_prop= "i-cache-sets", >> -}, >> -{ >> .name= "Data", >> .size_prop = "d-cache-size", >> .line_size_props = { "d-cache-line-size", >> "d-cache-block-size", }, >> .nr_sets_prop= "d-cache-sets", >> }, >> +{ >> +.name= "Instruction", >> +.size_prop = "i-cache-size", >> +.line_size_props = { "i-cache-line-size", >> + "i-cache-block-size", }, >> +.nr_sets_prop= "i-cache-sets", >> +}, >> }; > > > Hey Sudeep, > > After applying this patch, the cache_type_info array looks like this. > > static const struct cache_type_info cache_type_info[] = { > { > /* > * PowerPC Processor binding says the [di]-cache-* > * must be equal on unified caches, so just use > * d-cache properties. > */ > .name= "Unified", > .size_prop = "d-cache-size", > .line_size_props = { "d-cache-line-size", > "d-cache-block-size", }, &
[V5 2/4] perf, tool: Conditional branch filter 'cond' added to perf record
Adding perf record support for new branch stack filter criteria PERF_SAMPLE_BRANCH_COND. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- tools/perf/builtin-record.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 3c394bf..eb74bcd 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -589,6 +589,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX), BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX), BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX), + BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND), BRANCH_END }; -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V5 1/4] perf: Add PERF_SAMPLE_BRANCH_COND
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which will extend the existing perf ABI. Various architectures can provide this functionality with either with HW filtering support (if present) or with SW filtering of captured branch instructions. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- include/uapi/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 853bc1c..696f69b4 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -163,8 +163,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */ PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */ + PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ - PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V5 0/4] perf: New conditional branch filter
Hello Arnaldo, I had posted the V5 version of PowerPC SW branch filter enablement patchset last month. Please find the patchset here at https://lkml.org/lkml/2014/2/5/79 These following patches (2,4,5,6 patches from the original V5 version patchset) are the ones which change code in the generic kernel, perf tool and X86 perf. Basically this patchset adds one more branch filter for "conditional" branches. In X86 code, this new filter has been implemented with the help of availble SW filter X86_BR_JCC and LBR_JCC. We had some discussions in this regard before. Please review these changes and if it's okay, please merge them. Other patches in the series are powerpc specific and are being reviewed by Benjamin Herrenschmidt and Michael Ellerman. Let me know if you need more information. [1] https://lkml.org/lkml/2013/5/22/51 [2] https://lkml.org/lkml/2013/8/30/10 [3] https://lkml.org/lkml/2013/10/16/75 [4] https://lkml.org/lkml/2013/12/4/168 [5] https://lkml.org/lkml/2014/2/5/79 c: Arnaldo Carvalho de Melo Cc: Stephane Eranian Cc: Andi Kleen Cc: Ingo Molnar Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Peter Zijlstra Anshuman Khandual (4): perf: Add PERF_SAMPLE_BRANCH_COND perf, tool: Conditional branch filter 'cond' added to perf record x86, perf: Add conditional branch filtering support perf, documentation: Description for conditional branch filter arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 + include/uapi/linux/perf_event.h| 3 ++- tools/perf/Documentation/perf-record.txt | 3 ++- tools/perf/builtin-record.c| 1 + 4 files changed, 10 insertions(+), 2 deletions(-) -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V5 4/4] perf, documentation: Description for conditional branch filter
Adding documentation support for conditional branch filter. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- tools/perf/Documentation/perf-record.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index c71b0f3..d460049 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -184,9 +184,10 @@ following filters are defined: - in_tx: only when the target is in a hardware transaction - no_tx: only when the target is not in a hardware transaction - abort_tx: only when the target is a hardware transaction abort + - cond: conditional branches + -The option requires at least one branch type among any, any_call, any_ret, ind_call. +The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. The privilege levels may be omitted, in which case, the privilege levels of the associated event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege levels are subject to permissions. When sampling on multiple events, branch stack sampling -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V5 3/4] x86, perf: Add conditional branch filtering support
This patch adds conditional branch filtering support, enabling it for PERF_SAMPLE_BRANCH_COND in perf branch stack sampling framework by utilizing an available software filter X86_BR_JCC. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian --- arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index d82d155..9dd2459 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event) if (br_type & PERF_SAMPLE_BRANCH_NO_TX) mask |= X86_BR_NO_TX; + if (br_type & PERF_SAMPLE_BRANCH_COND) + mask |= X86_BR_JCC; + /* * stash actual user request into reg, it may * be used by fixup code for some CPU @@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { * NHM/WSM erratum: must include IND_JMP to capture IND_CALL */ [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { @@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { [PERF_SAMPLE_BRANCH_ANY_CALL] = LBR_REL_CALL | LBR_IND_CALL | LBR_FAR, [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; /* core */ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] powerpc, ptrace: Add new ptrace request macros for transactional memory
On 04/26/2014 05:12 AM, Pedro Alves wrote: > On 04/02/2014 08:02 AM, Anshuman Khandual wrote: >> This patch adds following new sets of ptrace request macros for transactional >> memory expanding the existing ptrace ABI on PowerPC. >> >> /* TM special purpose registers */ >> PTRACE_GETTM_SPRREGS >> PTRACE_SETTM_SPRREGS >> >> /* TM checkpointed GPR registers */ >> PTRACE_GETTM_CGPRREGS >> PTRACE_SETTM_CGPRREGS >> >> /* TM checkpointed FPR registers */ >> PTRACE_GETTM_CFPRREGS >> PTRACE_SETTM_CFPRREGS >> >> /* TM checkpointed VMX registers */ >> PTRACE_GETTM_CVMXREGS >> PTRACE_SETTM_CVMXREGS > > Urgh, we're _still_ adding specialized register specific calls? > Why aren't these exported as new register sets, accessible through > PTRACE_GETREGSET / PTRACE_SETREGSET? That's supposed to be the > Modern Way to do things. All these new register sets can be accessed through PTRACE_GETREGSET /SETREGSET requests with the new NT_PPC_* core note types added in the previous patch. PowerPC already has some register specific ptrace requests, so thought of adding some new requests for transactional memory purpose. But yes these are redundant and can be dropped. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ptrace: Fix PTRACE_GETREGSET/PTRACE_SETREGSET in code documentation
The current documentation is bit misleading and does not explicitly specify that iov.len need to be initialized failing which kernel may just ignore the ptrace request and never read from/write into the user specified buffer. This patch fixes the documentation. Signed-off-by: Anshuman Khandual --- include/uapi/linux/ptrace.h | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h index cf1019e..e9d6b37 100644 --- a/include/uapi/linux/ptrace.h +++ b/include/uapi/linux/ptrace.h @@ -43,8 +43,12 @@ * * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov); * - * On the successful completion, iov.len will be updated by the kernel, - * specifying how much the kernel has written/read to/from the user's iov.buf. + * A non-zero value upto the max size of data expected to be written/read by the + * kernel in response to any NT_XXX_TYPE request type must be assigned to iov.len + * before initiating the ptrace call. If iov.len is 0, then kernel will neither + * read from or write into the user buffer specified. On successful completion, + * iov.len will be updated by the kernel, specifying how much the kernel has + * written/read to/from the user's iov.buf. */ #define PTRACE_GETREGSET 0x4204 #define PTRACE_SETREGSET 0x4205 -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Add new ptrace request macros on PowerPC
On 04/02/2014 03:02 PM, Anshuman Khandual wrote: > On 04/02/2014 12:32 PM, Anshuman Khandual wrote: >> This patch series adds new ELF note sections which are used to >> create new ptrace request macros for various transactional memory and >> miscellaneous registers on PowerPC. Please find the test case exploiting >> the new ptrace request macros and it's results on a POWER8 system. >> >> RFC: https://lkml.org/lkml/2014/4/1/292 >> >> == Results == >> ---TM specific SPR-- >> TM TFHAR: 19dc >> TM TEXASR: de01ac01 >> TM TFIAR: c003f386 >> TM CH ORIG_MSR: 9005f032 >> TM CH TAR: 6 >> TM CH PPR: c >> TM CH DSCR: 1 >> ---TM checkpointed GPR- >> TM CH GPR[0]: 197c >> TM CH GPR[1]: 5 >> TM CH GPR[2]: 6 >> TM CH GPR[7]: 1 >> TM CH NIP: 19dc >> TM CH LINK: 197c >> TM CH CCR: 22000422 >> ---TM running GPR- >> TM RN GPR[0]: 197c >> TM RN GPR[1]: 7 >> TM RN GPR[2]: 8 >> TM RN GPR[7]: 5 >> TM RN NIP: 19fc >> TM RN LINK: 197c >> TM RN CCR: 2000422 >> ---TM running FPR- >> TM RN FPR[0]: 1002d3a3780 >> TM RN FPR[1]: 7 >> TM RN FPR[2]: 8 >> TM RN FPSCR: 0 >> ---TM checkpointed FPR- >> TM CH FPR[0]: 1002d3a3780 >> TM CH FPR[1]: 5 >> TM CH FPR[2]: 6 >> TM CH FPSCR: 0 >> ---Running miscellaneous registers--- > TM RN DSCR: 0 > > There is a problem in here which I forgot to mention. The running DSCR value > comes from thread->dscr component of the target process. While we are inside > the > transaction (which is the case here as we are stuck at "b ." instruction and > have not reached TEND) thread->dscr should have the running value of the DSCR > register at that point of time. Here we expect the DSCR value to be 5 instead > of 0 as shown in the output above. During the tests when I moved the "b ." > after > TEND, the thread->dscr gets the value of 5 while all check pointed reg values > are > thrown away. I believe there is some problem in the way thread->dscr context > is saved away inside the TM section. Will look into this problem further and > keep informed. Reason behind this inconsistent DSCR register value is because of the following commit where the kernel reverts the DSCR register into a default value to avoid running with the user set value for a long time, thus preventing any potential performance degradation. Same reason applies to the PPR register as well. So its not a problem but an expected behaviour. commit e9bdc3d6143d1c4b8d8ce5231fc958268331f983 Author: Michael Neuling Date: Thu Sep 26 13:29:09 2013 +1000 powerpc/tm: Switch out userspace PPR and DSCR sooner When we do a treclaim or trecheckpoint we end up running with userspace PPR and DSCR values. Currently we don't do anything special to avoid running with user values which could cause a severe performance degradation. This patch moves the PPR and DSCR save and restore around treclaim and trecheckpoint so that we run with user values for a much shorter period. More care is taken with the PPR as it's impact is greater than the DSCR. This is similar to user exceptions, where we run HTM_MEDIUM early to ensure that we don't run with a userspace PPR values in the kernel. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Add new ptrace request macros on PowerPC
On 04/29/2014 12:36 PM, Michael Neuling wrote: > How is it causing the problem? As mentioned before, what I thought to be a problem is something expected behaviour. So it's not a problem any more. DSCR value inside the transaction will fall back to default as kernel wont let user specified value to remain applied for a long time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Add new ptrace request macros on PowerPC
On 04/29/2014 01:52 PM, Michael Neuling wrote: > That's not what that patch does. It shouldn't make any user visible changes > to DSCR or PPR. It may not when it runs uninterrupted but after the tracee process has stopped, thread.dscr reflects the default DSCR value as mentioned before. This can be proved by changing the "dscr_default" value in arch/powerpc/sysfs.c file. > > Over syscall PPR and DSCR may change. Depending on your test case, that may > be your problem. I would guess when the tracee process stops for ptrace analysis, tm_reclaim or tm_recheckpoint path might be crossed which is causing this dscr_default value to go into thread_struct. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 05/15/2014 05:38 PM, Pedro Alves wrote: > On 05/15/2014 09:25 AM, Anshuman Khandual wrote: >> On 05/14/2014 04:45 PM, Pedro Alves wrote: >>> On 05/14/14 06:46, Anshuman Khandual wrote: >>>> On 05/13/2014 10:43 PM, Pedro Alves wrote: >>>>> On 05/05/14 08:54, Anshuman Khandual wrote: >>>>>> This patch enables get and set of transactional memory related register >>>>>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing >>>>>> four new powerpc specific register sets i.e REGSET_TM_SPR, >>>>>> REGSET_TM_CGPR, >>>>>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new >>>>>> ELF core note types added previously in this regard. >>>>>> >>>>>> (1) NT_PPC_TM_SPR >>>>>> (2) NT_PPC_TM_CGPR >>>>>> (3) NT_PPC_TM_CFPR >>>>>> (4) NT_PPC_TM_CVMX >>>>> >>>>> Sorry that I couldn't tell this from the code, but, what does the >>>>> kernel return when the ptracer requests these registers and the >>>>> program is not in a transaction? Specifically I'm wondering whether >>>>> this follows the same semantics as the s390 port. >>>>> >>>> >>>> Right now, it still returns the saved state of the registers from thread >>>> struct. I had assumed that the user must know the state of the transaction >>>> before initiating the ptrace request. I guess its better to check for >>>> the transaction status before processing the request. In case if TM is not >>>> active on that thread, we should return -EINVAL. >>> >>> I think s390 returns ENODATA in that case. >>> >>> https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html >>> >>> We'll want some way to tell whether the system actually >>> supports this. That could be ENODATA vs something-else (EINVAL >>> or perhaps better EIO for "request is invalid"). >> >> As Mickey has pointed out, the transaction memory support in the system can >> be >> checked from the HWCAP2 flags. So when the transaction is not active, we will >> return ENODATA instead for TM related ptrace regset requests. > > Returning ENODATA when the transaction is not active, like > s390 is great. Thank you. > > But I think it's worth it to consider what should the kernel > return when the machine doesn't have these registers at all. > > Sure, for this case we happen to have the hwcap flag. But in > general, I don't know whether we will always have a hwcap bit > for each register set that is added. Maybe we will, so that > the info ends up in core dumps. > > Still, I think it's worth to consider this case in the > general sense, irrespective of hwcap. > > That is, what should PTRACE_GETREGSET/PTRACE_SETREGSET return > when the machine doesn't have the registers at all. We shouldn't > need to consult something elsewhere (like hwcap) to determine > what ENODATA means. The kernel knows it right there. I think > s390 goofed here. > > Taking a look at x86, for example, we see: > > [REGSET_XSTATE] = { > .core_note_type = NT_X86_XSTATE, > .size = sizeof(u64), .align = sizeof(u64), > .active = xstateregs_active, .get = xstateregs_get, > .set = xstateregs_set > }, > > Note that it installs the ".active" hook. > > 24 /** > 25 * user_regset_active_fn - type of @active function in &struct user_regset > 26 * @target: thread being examined > 27 * @regset: regset being examined > 28 * > 29 * Return -%ENODEV if not available on the hardware found. > 30 * Return %0 if no interesting state in this thread. > 31 * Return >%0 number of @size units of interesting state. > 32 * Any get call fetching state beyond that number will > 33 * see the default initialization state for this data, > 34 * so a caller that knows what the default state is need > 35 * not copy it all out. > 36 * This call is optional; the pointer is %NULL if there > 37 * is no inexpensive check to yield a value < @n. > 38 */ > 39 typedef int user_regset_active_fn(struct task_struct *target, > 40 const struct user_regset *regset); > 41 > > Note the mention of ENODEV. > > I couldn't actually find any arch that currently returns -ENODEV in > the "active" hook. I see that binfmt_elf.c doesn't handle > regset->active
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 05/15/2014 05:38 PM, Pedro Alves wrote: > On 05/15/2014 09:25 AM, Anshuman Khandual wrote: >> On 05/14/2014 04:45 PM, Pedro Alves wrote: >>> On 05/14/14 06:46, Anshuman Khandual wrote: >>>> On 05/13/2014 10:43 PM, Pedro Alves wrote: >>>>> On 05/05/14 08:54, Anshuman Khandual wrote: >>>>>> This patch enables get and set of transactional memory related register >>>>>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing >>>>>> four new powerpc specific register sets i.e REGSET_TM_SPR, >>>>>> REGSET_TM_CGPR, >>>>>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new >>>>>> ELF core note types added previously in this regard. >>>>>> >>>>>> (1) NT_PPC_TM_SPR >>>>>> (2) NT_PPC_TM_CGPR >>>>>> (3) NT_PPC_TM_CFPR >>>>>> (4) NT_PPC_TM_CVMX >>>>> >>>>> Sorry that I couldn't tell this from the code, but, what does the >>>>> kernel return when the ptracer requests these registers and the >>>>> program is not in a transaction? Specifically I'm wondering whether >>>>> this follows the same semantics as the s390 port. >>>>> >>>> >>>> Right now, it still returns the saved state of the registers from thread >>>> struct. I had assumed that the user must know the state of the transaction >>>> before initiating the ptrace request. I guess its better to check for >>>> the transaction status before processing the request. In case if TM is not >>>> active on that thread, we should return -EINVAL. >>> >>> I think s390 returns ENODATA in that case. >>> >>> https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html >>> >>> We'll want some way to tell whether the system actually >>> supports this. That could be ENODATA vs something-else (EINVAL >>> or perhaps better EIO for "request is invalid"). >> >> As Mickey has pointed out, the transaction memory support in the system can >> be >> checked from the HWCAP2 flags. So when the transaction is not active, we will >> return ENODATA instead for TM related ptrace regset requests. > > Returning ENODATA when the transaction is not active, like > s390 is great. Thank you. > > But I think it's worth it to consider what should the kernel > return when the machine doesn't have these registers at all. > > Sure, for this case we happen to have the hwcap flag. But in > general, I don't know whether we will always have a hwcap bit > for each register set that is added. Maybe we will, so that > the info ends up in core dumps. > > Still, I think it's worth to consider this case in the > general sense, irrespective of hwcap. > > That is, what should PTRACE_GETREGSET/PTRACE_SETREGSET return > when the machine doesn't have the registers at all. We shouldn't > need to consult something elsewhere (like hwcap) to determine > what ENODATA means. The kernel knows it right there. I think > s390 goofed here. > > Taking a look at x86, for example, we see: > > [REGSET_XSTATE] = { > .core_note_type = NT_X86_XSTATE, > .size = sizeof(u64), .align = sizeof(u64), > .active = xstateregs_active, .get = xstateregs_get, > .set = xstateregs_set > }, > > Note that it installs the ".active" hook. > > 24 /** > 25 * user_regset_active_fn - type of @active function in &struct user_regset > 26 * @target: thread being examined > 27 * @regset: regset being examined > 28 * > 29 * Return -%ENODEV if not available on the hardware found. > 30 * Return %0 if no interesting state in this thread. > 31 * Return >%0 number of @size units of interesting state. > 32 * Any get call fetching state beyond that number will > 33 * see the default initialization state for this data, > 34 * so a caller that knows what the default state is need > 35 * not copy it all out. > 36 * This call is optional; the pointer is %NULL if there > 37 * is no inexpensive check to yield a value < @n. > 38 */ > 39 typedef int user_regset_active_fn(struct task_struct *target, > 40 const struct user_regset *regset); > 41 > > Note the mention of ENODEV. > > I couldn't actually find any arch that currently returns -ENODEV in > the "active" hook. I see that binfmt_elf.c doesn't handle > regset->activ
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 05/19/2014 08:13 PM, Pedro Alves wrote: > On 05/19/2014 12:46 PM, Anshuman Khandual wrote: > >>>> I couldn't actually find any arch that currently returns -ENODEV in >>>> the "active" hook. I see that binfmt_elf.c doesn't handle >>>> regset->active() returning < 0. Guess that may be why. Looks like >>>> something that could be cleaned up, to me. >>>> >> Also it does not consider the return value of regset->active(t->task, regset) >> (whose objective is to figure out whether we need to request regset->n number >> of elements or less than that) in the subsequent call to regset->get >> function. > > Indeed. > > TBC, do you plan on fixing this? Otherwise ... Sure, thinking something like this as mentioned below. But still not sure how to use the return type of -ENODEV from the function regset->active(). Right now if any regset does have the active hook and it returns anything but positive value, it will be ignored and the control moves to the next regset in view. This prevents the thread core note type being written to the core dump. diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index aa3cb62..80672fb 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1553,7 +1553,15 @@ static int fill_thread_core_info(struct elf_thread_core_info *t, if (regset->core_note_type && regset->get && (!regset->active || regset->active(t->task, regset))) { int ret; - size_t size = regset->n * regset->size; + size_t size; + + /* Request only the active elements in the regset */ + if (!regset->active) + size = regset->n * regset->size; + else + size = regset->active(t->task, regset) + * regset->size; + void *data = kmalloc(size, GFP_KERNEL); if (unlikely(!data)) return 0; > >> Now coming to the installation of the .active hooks part for all the new >> regsets, it >> should be pretty straight forward as well. Though its optional and used for >> elf_core_dump >> purpose only, its worth adding them here. Example of an active function >> should be something >> like this. The function is inexpensive as required. >> >> +static int tm_spr_active(struct task_struct *target, >> + const struct user_regset *regset) >> +{ >> + if (!cpu_has_feature(CPU_FTR_TM)) >> + return -ENODEV; > > ... unfortunately this will do the wrong thing. I am not sure whether I understand this correctly. Are you saying that its wrong to return -ENODEV in this case as above ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET, documentation in uapi header
On 05/14/2014 04:24 PM, Pedro Alves wrote: > On 05/14/14 08:10, Anshuman Khandual wrote: >> On 05/13/2014 11:39 PM, Pedro Alves wrote: >>> On 05/05/14 05:10, Anshuman Khandual wrote: >>>> On 05/01/2014 07:43 PM, Pedro Alves wrote: >>> OK, then this is what I suggest instead: > ... >>>> Shall I resend the patch with the your proposed changes and your >>>> "Signed-off-by" and >>>> moving myself as "Reported-by" ? >>> >>> No idea of the actual policy to follow. Feel free to do that if that's the >>> standard procedure. >> >> Even I am not sure about this, so to preserve the correct authorship, would >> you >> mind sending this patch ? > > Here you go. This is against current Linus'. Please take it from > here if necessary. Thanks Pedro for the patch. I would assume that the ptrace maintainer (Roland or Oleg as mentioned in the MAINTAINERS file) will pick it from here and merge mainline. Please do let me know if the process is otherwise different. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Add new ptrace request macros on PowerPC
On 04/30/2014 05:59 AM, Michael Neuling wrote: > Anshuman Khandual wrote: > >> On 04/29/2014 01:52 PM, Michael Neuling wrote: >>> That's not what that patch does. It shouldn't make any user visible changes >>> to DSCR or PPR. >> >> It may not when it runs uninterrupted but after the tracee process has >> stopped, thread.dscr reflects the default DSCR value as mentioned >> before. This can be proved by changing the "dscr_default" value in >> arch/powerpc/sysfs.c file. > > The intention with DSCR is that if the user changes the DSCR, the kernel > should always save/restore it. If you are seeing something else, then > that is a bug. Anton has a test case for this here: > > http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c > > If that is failing, then there is a bug that we need to fix. > Anton's above DSCR test passed. > The PPR is the same, except that the kernel can change it over a > syscall. > >>> Over syscall PPR and DSCR may change. > > Sorry, this should be only PPR. DSCR shouldn't change over a syscall, > at least that's the intention. > >>> Depending on your test case, that may >>> be your problem. >> >> I would guess when the tracee process stops for ptrace analysis, tm_reclaim >> or >> tm_recheckpoint path might be crossed which is causing this dscr_default >> value >> to go into thread_struct. > > That shouldn't happen. If that's happening, it's a bug. I would believe this is happening. Also after reverting the commit e9bdc3d6143d1c4b8d8ce5231, thread.dscr reflects the same value as that of thread.tm_dscr which is the check pointed DSCR register value just before the transaction started. So even the NIP has moved passed the point where the user changes DSCR inside the transaction, thread.dscr is unable to capture that latest value. But thread.dscr must contain the latest user changed value of DSCR which is definitely not happening here. So there is a problem we need to fix. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 05/13/2014 10:43 PM, Pedro Alves wrote: > On 05/05/14 08:54, Anshuman Khandual wrote: >> This patch enables get and set of transactional memory related register >> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing >> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR, >> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new >> ELF core note types added previously in this regard. >> >> (1) NT_PPC_TM_SPR >> (2) NT_PPC_TM_CGPR >> (3) NT_PPC_TM_CFPR >> (4) NT_PPC_TM_CVMX > > Sorry that I couldn't tell this from the code, but, what does the > kernel return when the ptracer requests these registers and the > program is not in a transaction? Specifically I'm wondering whether > this follows the same semantics as the s390 port. > Right now, it still returns the saved state of the registers from thread struct. I had assumed that the user must know the state of the transaction before initiating the ptrace request. I guess its better to check for the transaction status before processing the request. In case if TM is not active on that thread, we should return -EINVAL. I am not familiar with the s390 side of code. But if we look at the s390_tdb_get function it checks for (regs->int_code & 0x200) before processing the request. Not sure what 0x200 signifies though. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 05/13/2014 10:51 PM, Pedro Alves wrote: > I wonder whether people are getting Roland's address from? > > It's frequent that ptrace related patches end up CCed to > rol...@redhat.com, but, he's not been at Red Hat for a few years > now. Roland, do you still want to be CCed on ptrace-related > issues? If so, there's probably a script somewhere in the > kernel that needs updating. If not, well, it'd be good > if it were updated anyway. :-) > > It's a little annoying, as Red Hat's servers outright reject > email sent from a @redhat.com address if one tries to send > an email that includes a CC/FROM to a user that no longer > exists in the @redhat.com domain. Got the email address from some of the previous ptrace related commits. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ptrace: Fix PTRACE_GETREGSET/PTRACE_SETREGSET in code documentation
On 05/13/2014 11:39 PM, Pedro Alves wrote: > On 05/05/14 05:10, Anshuman Khandual wrote: >> On 05/01/2014 07:43 PM, Pedro Alves wrote: >>> On 04/28/2014 12:00 PM, Anshuman Khandual wrote: >>>> The current documentation is bit misleading and does not explicitly >>>> specify that iov.len need to be initialized failing which kernel >>>> may just ignore the ptrace request and never read from/write into >>>> the user specified buffer. This patch fixes the documentation. >>> >>> Well, it kind of does, here: >>> >>> * struct iovec iov = { buf, len}; >> >> :) Thats not explicit enough. >> >>> >>>> @@ -43,8 +43,12 @@ >>>> * >>>> *ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, >>>> NT_XXX_TYPE, &iov); >>>> * >>>> - * On the successful completion, iov.len will be updated by the kernel, >>>> - * specifying how much the kernel has written/read to/from the user's >>>> iov.buf. >>>> + * A non-zero value upto the max size of data expected to be written/read >>>> by the >>>> + * kernel in response to any NT_XXX_TYPE request type must be assigned to >>>> iov.len >>>> + * before initiating the ptrace call. If iov.len is 0, then kernel will >>>> neither >>>> + * read from or write into the user buffer specified. On successful >>>> completion, >>>> + * iov.len will be updated by the kernel, specifying how much the kernel >>>> has >>>> + * written/read to/from the user's iov.buf. >>> >>> I really appreciate that you're trying to make this clearer, but I >>> find the new sentence very hard to read/reason. :-/ >>> >>> I suggest: >>> >>> * This interface usage is as follows: >>> - * struct iovec iov = { buf, len}; >>> + * struct iovec iov = { buf, len }; >>> * >>> * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, >>> &iov); >>> * >>> - * On the successful completion, iov.len will be updated by the kernel, >>> - * specifying how much the kernel has written/read to/from the user's >>> iov.buf. >>> + * On entry, iov describes the buffer's address and length. The buffer's >>> + * length must be equal to or shorter than the size of the NT_XXX_TYPE >>> regset. >>> + * On successful completion, iov.len is updated by the kernel, specifying >>> how >>> + * much the kernel has written/read to/from the user's iov.buf. >>> >> >> Yeah, sounds better. I may add "If the length is zero, the kernel will >> neither read >> from or write into the buffer" > > Well, I think that much should be obvious. What's not obvious is > whether that is considered success or error (what is the return code?) > I suspect and expect success return if the regset type is known, and > error otherwise. So that could be used as a way to probe for support > for a given regset without using stack or heap space, if it ever matters. > The kernel never reads/writes beyond iov.len, so better say that, and > then it automatically gets the 0 case handled too, right? > >>> I'm not sure I understood what you're saying correctly, though. >>> Specifically, >>> I don't know whether the buffer's length must really be shorter than the >>> size of the NT_XXX_TYPE regset. >> >> No, it does not have to. From the code snippet below (ptrace_regset function) >> the buffer length has to be multiple of regset->size for the given >> NT_XXX_TYPE >> upto the max regset size for the user to see any valid data. > > Ah, I guess one could call it a bug. If the passed in > len is bigger than the whole register set size, then there seems > to be no point in validating whether the length is multiple of > a single register's size. That unnecessarily prevents coming up > with a register set in the future that has registers of > different sizes... > > But given that that's how things are today, I suppose we should > document it... > > The problem what I >> faced was when you use any iovec structure with the length parameter >> uninitialized, >> the kernel simply ignores and does not return anything. > > Ah. Well, saying "does not return anything" is quite confusing. It does > return something -- -EINVAL. > >> >> if (!regset || (kiov->iov_len % regset->si
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 05/14/2014 04:45 PM, Pedro Alves wrote: > On 05/14/14 06:46, Anshuman Khandual wrote: >> On 05/13/2014 10:43 PM, Pedro Alves wrote: >>> On 05/05/14 08:54, Anshuman Khandual wrote: >>>> This patch enables get and set of transactional memory related register >>>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing >>>> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR, >>>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new >>>> ELF core note types added previously in this regard. >>>> >>>>(1) NT_PPC_TM_SPR >>>>(2) NT_PPC_TM_CGPR >>>>(3) NT_PPC_TM_CFPR >>>>(4) NT_PPC_TM_CVMX >>> >>> Sorry that I couldn't tell this from the code, but, what does the >>> kernel return when the ptracer requests these registers and the >>> program is not in a transaction? Specifically I'm wondering whether >>> this follows the same semantics as the s390 port. >>> >> >> Right now, it still returns the saved state of the registers from thread >> struct. I had assumed that the user must know the state of the transaction >> before initiating the ptrace request. I guess its better to check for >> the transaction status before processing the request. In case if TM is not >> active on that thread, we should return -EINVAL. > > I think s390 returns ENODATA in that case. > > https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html > > We'll want some way to tell whether the system actually > supports this. That could be ENODATA vs something-else (EINVAL > or perhaps better EIO for "request is invalid"). As Mickey has pointed out, the transaction memory support in the system can be checked from the HWCAP2 flags. So when the transaction is not active, we will return ENODATA instead for TM related ptrace regset requests. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V5 0/4] perf: New conditional branch filter
On 03/07/2014 02:36 PM, Anshuman Khandual wrote: > Hello Arnaldo, > > I had posted the V5 version of PowerPC SW branch filter enablement > patchset last month. Please find the patchset here at > > https://lkml.org/lkml/2014/2/5/79 > > These following patches (2,4,5,6 patches from the original V5 version > patchset) > are the ones which change code in the generic kernel, perf tool and X86 perf. > Basically this patchset adds one more branch filter for "conditional" > branches. > In X86 code, this new filter has been implemented with the help of availble SW > filter X86_BR_JCC and LBR_JCC. We had some discussions in this regard before. > Please review these changes and if it's okay, please merge them. Other patches > in the series are powerpc specific and are being reviewed by Benjamin > Herrenschmidt > and Michael Ellerman. Let me know if you need more information. > > [1] https://lkml.org/lkml/2013/5/22/51 > [2] https://lkml.org/lkml/2013/8/30/10 > [3] https://lkml.org/lkml/2013/10/16/75 > [4] https://lkml.org/lkml/2013/12/4/168 > [5] https://lkml.org/lkml/2014/2/5/79 Hey Arnaldo, Do you have any comments or suggestions on this ? Have not received any response on these proposed patch series yet. Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes
On 06/12/2014 02:39 PM, Anshuman Khandual wrote: > On 05/23/2014 08:45 PM, Anshuman Khandual wrote: >> This patch series adds five new ELF core note sections which can be >> used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing >> various transactional memory and miscellaneous register sets on PowerPC >> platform. Please find a test program exploiting these new ELF core note >> types on a POWER8 system. >> >> RFC: https://lkml.org/lkml/2014/4/1/292 >> V1: https://lkml.org/lkml/2014/4/2/43 >> V2: https://lkml.org/lkml/2014/5/5/88 >> >> Changes in V3 >> = >> (1) Added two new error paths in every TM related get/set functions when >> regset >> support is not present on the system (ENODEV) or when the process does >> not >> have any transaction active (ENODATA) in the context >> >> (2) Installed the active hooks for all the newly added regset core note types >> >> Changes in V2 >> = >> (1) Removed all the power specific ptrace requests corresponding to new >> NT_PPC_* >> elf core note types. Now all the register sets can be accessed from >> ptrace >> through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* >> core >> note type instead >> (2) Fixed couple of attribute values for REGSET_TM_CGPR register set >> (3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread >> (4) Fixed 32 bit checkpointed GPR support >> (5) Changed commit messages accordingly >> >> Outstanding Issues >> == >> (1) Running DSCR register value inside a transaction does not seem to be >> saved >> at thread.dscr when the process stops for ptrace examination. > > Hey Ben, > > Any updates on this patch series ? Ben, Any updates on this patch series ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] powerpc/powernv: include asm/smp.h to handle UP config
On 06/05/2014 08:51 PM, Shreyas B. Prabhu wrote: > Build throws following errors when CONFIG_SMP=n > arch/powerpc/platforms/powernv/setup.c: In function > ‘pnv_kexec_wait_secondaries_down’: > arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of > function ‘get_hard_smp_processor_id’ > rc = opal_query_cpu_status(get_hard_smp_processor_id(i), > > The usage of get_hard_smp_processor_id() needs the declaration from > . The file setup.c includes , which in-turn > includes . However, includes > only on SMP configs and hence UP builds fail. > > Fix this by directly including in setup.c unconditionally. Can you please clean up the description in the commit message ? and also the first line in the commit message should mention that the patch is trying to fix a UP specific build failure. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] powerpc/powernv : Disable subcore for UP configs
On 06/05/2014 08:54 PM, Shreyas B. Prabhu wrote: > Build throws following errors when CONFIG_SMP=n > arch/powerpc/platforms/powernv/subcore.c: In function ‘cpu_update_split_mode’: > arch/powerpc/platforms/powernv/subcore.c:274:15: error: ‘setup_max_cpus’ > undeclared (first use in this function) > arch/powerpc/platforms/powernv/subcore.c:285:5: error: lvalue required as > left operand of assignment > > 'setup_max_cpus' variable is relevant only on SMP, so there is no point > working around it for UP. Furthermore, subcore.c itself is relevant only > on SMP and hence the better solution is to exclude subcore.c for UP builds. > > Signed-off-by: Shreyas B. Prabhu > --- > This patch applies on top of ben/powerpc.git/next branch > > arch/powerpc/platforms/powernv/Makefile | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/Makefile > b/arch/powerpc/platforms/powernv/Makefile > index 4ad0d34..636d206 100644 > --- a/arch/powerpc/platforms/powernv/Makefile > +++ b/arch/powerpc/platforms/powernv/Makefile > @@ -1,9 +1,9 @@ > obj-y+= setup.o opal-takeover.o opal-wrappers.o > opal.o opal-async.o > obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o > opal-flash.o > obj-y+= rng.o opal-elog.o opal-dump.o > opal-sysparam.o opal-sensor.o > -obj-y+= opal-msglog.o subcore.o subcore-asm.o > +obj-y+= opal-msglog.o subcore-asm.o > subcore-asm.o can also move down here as well ? > -obj-$(CONFIG_SMP)+= smp.o > +obj-$(CONFIG_SMP)+= smp.o subcore.o -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET, documentation in uapi header
On 05/14/2014 04:24 PM, Pedro Alves wrote: > On 05/14/14 08:10, Anshuman Khandual wrote: >> On 05/13/2014 11:39 PM, Pedro Alves wrote: >>> On 05/05/14 05:10, Anshuman Khandual wrote: >>>> On 05/01/2014 07:43 PM, Pedro Alves wrote: >>> OK, then this is what I suggest instead: > ... >>>> Shall I resend the patch with the your proposed changes and your >>>> "Signed-off-by" and >>>> moving myself as "Reported-by" ? >>> >>> No idea of the actual policy to follow. Feel free to do that if that's the >>> standard procedure. >> >> Even I am not sure about this, so to preserve the correct authorship, would >> you >> mind sending this patch ? > > Here you go. This is against current Linus'. Please take it from > here if necessary. > > 8<-- > From 1237f5ac5896f3910f66df83a5093bb548006188 Mon Sep 17 00:00:00 2001 > From: Pedro Alves > Date: Wed, 14 May 2014 11:05:07 +0100 > Subject: [PATCH] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET > documentation in uapi header > > The current comments don't explicitly state in plain words that > iov.len must be set to the buffer's length prior to the ptrace call. > A user might get confused and leave that uninitialized. > > In the ptrace_regset function (snippet below) we see that the buffer > length has to be a multiple of the slot/register size for the given > NT_XXX_TYPE: > > if (!regset || (kiov->iov_len % regset->size) != 0) > return -EINVAL; > > Note regset->size is the size of each slot/register in the set, not > the size of the whole set. > > And then, we see here: > > kiov->iov_len = min(kiov->iov_len, > (__kernel_size_t) (regset->n * regset->size)); > > that the kernel takes care of capping the requested length to the size > of the whole regset. > > Signed-off-by: Pedro Alves > Reported-by: Anshuman Khandual > --- > include/uapi/linux/ptrace.h | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h > index cf1019e..30836b9 100644 > --- a/include/uapi/linux/ptrace.h > +++ b/include/uapi/linux/ptrace.h > @@ -39,12 +39,17 @@ > * payload are exactly the same layout. > * > * This interface usage is as follows: > - * struct iovec iov = { buf, len}; > + * struct iovec iov = { buf, len }; > * > * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov); > * > - * On the successful completion, iov.len will be updated by the kernel, > - * specifying how much the kernel has written/read to/from the user's > iov.buf. > + * On entry, iov describes the buffer's address and length. The buffer's > length > + * must be a multiple of the size of a single register in the register set. > The > + * kernel never reads or writes more than iov.len, and caps the buffer > length to > + * the register set's size. In other words, the kernel reads or writes > + * min(iov.len, regset size). On successful completion, iov.len is updated > by > + * the kernel, specifying how much the kernel has read from / written to the > + * user's iov.buf. > */ > #define PTRACE_GETREGSET 0x4204 > #define PTRACE_SETREGSET 0x4205 Hey Peter/Oleg, The above patch is a documentation fix which we discussed sometime back. Could you please kindly review and consider merging. Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes
On 05/23/2014 08:45 PM, Anshuman Khandual wrote: > This patch series adds five new ELF core note sections which can be > used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing > various transactional memory and miscellaneous register sets on PowerPC > platform. Please find a test program exploiting these new ELF core note > types on a POWER8 system. > > RFC: https://lkml.org/lkml/2014/4/1/292 > V1: https://lkml.org/lkml/2014/4/2/43 > V2: https://lkml.org/lkml/2014/5/5/88 > > Changes in V3 > = > (1) Added two new error paths in every TM related get/set functions when > regset > support is not present on the system (ENODEV) or when the process does not > have any transaction active (ENODATA) in the context > > (2) Installed the active hooks for all the newly added regset core note types > > Changes in V2 > = > (1) Removed all the power specific ptrace requests corresponding to new > NT_PPC_* > elf core note types. Now all the register sets can be accessed from ptrace > through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* > core > note type instead > (2) Fixed couple of attribute values for REGSET_TM_CGPR register set > (3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread > (4) Fixed 32 bit checkpointed GPR support > (5) Changed commit messages accordingly > > Outstanding Issues > == > (1) Running DSCR register value inside a transaction does not seem to be saved > at thread.dscr when the process stops for ptrace examination. Hey Ben, Any updates on this patch series ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] powerpc, ptrace: Add few more ptrace request macros
This patch adds few more ptrace request macros expanding the existing capability. These ptrace requests macros can be classified into two categories. (1) Transactional memory /* TM special purpose registers */ PTRACE_GETTM_SPRREGS PTRACE_SETTM_SPRREGS /* Checkpointed GPR registers */ PTRACE_GETTM_CGPRREGS PTRACE_SETTM_CGPRREGS /* Checkpointed FPR registers */ PTRACE_GETTM_CFPRREGS PTRACE_SETTM_CFPRREGS /* Checkpointed VMX registers */ PTRACE_GETTM_CVMXREGS PTRACE_SETTM_CVMXREGS (2) Miscellaneous /* TAR, PPR, DSCR registers */ PTRACE_GETMSCREGS PTRACE_SETMSCREGS This patch also adds mutliple new generic ELF core note sections in this regard which can be listed as follows. NT_PPC_TM_SPR /* Transactional memory specific registers */ NT_PPC_TM_CGPR /* Transactional memory checkpointed GPR */ NT_PPC_TM_CFPR /* Transactional memory checkpointed FPR */ NT_PPC_TM_CVMX /* Transactional memory checkpointed VMX */ NT_PPC_MISC /* Miscellaneous registers */ Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/switch_to.h | 8 + arch/powerpc/include/uapi/asm/ptrace.h | 61 +++ arch/powerpc/kernel/process.c | 24 ++ arch/powerpc/kernel/ptrace.c | 658 +++-- include/uapi/linux/elf.h | 5 + 5 files changed, 729 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 0e83e7d..73e2601 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct *t) } #endif +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +extern void flush_tmreg_to_thread(struct task_struct *); +#else +static inline void flush_tmreg_to_thread(struct task_struct *t) +{ +} +#endif + static inline void clear_task_ebb(struct task_struct *t) { #ifdef CONFIG_PPC_BOOK3S_64 diff --git a/arch/powerpc/include/uapi/asm/ptrace.h b/arch/powerpc/include/uapi/asm/ptrace.h index 77d2ed3..fd962d6 100644 --- a/arch/powerpc/include/uapi/asm/ptrace.h +++ b/arch/powerpc/include/uapi/asm/ptrace.h @@ -190,6 +190,67 @@ struct pt_regs { #define PPC_PTRACE_SETHWDEBUG 0x88 #define PPC_PTRACE_DELHWDEBUG 0x87 +/* Transactional memory registers */ + +/* + * SPR + * + * struct data { + * u64 tm_tfhar; + * u64 tm_texasr; + * u64 tm_tfiar; + * unsigned long tm_orig_msr; + * u64 tm_tar; + * u64 tm_ppr; + * u64 tm_dscr; + * }; + */ +#define PTRACE_GETTM_SPRREGS 0x70 +#define PTRACE_SETTM_SPRREGS 0x71 + +/* + * Checkpointed GPR + * + * struct data { + * struct pt_regs ckpt_regs; + * }; + */ +#define PTRACE_GETTM_CGPRREGS 0x72 +#define PTRACE_SETTM_CGPRREGS 0x73 + +/* + * Checkpointed FPR + * + * struct data { + * u64 fpr[32]; + * u64 fpscr; + * }; + */ +#define PTRACE_GETTM_CFPRREGS 0x74 +#define PTRACE_SETTM_CFPRREGS 0x75 + +/* + * Checkpointed VMX + * + * struct data { + * vector128 vr[32]; + * vector128 vscr; + * unsigned long vrsave; + *}; + */ +#define PTRACE_GETTM_CVMXREGS 0x76 +#define PTRACE_SETTM_CVMXREGS 0x77 + +/* Miscellaneous registers */ +#define PTRACE_GETMSCREGS 0x78 +#define PTRACE_SETMSCREGS 0x79 + +/* + * XXX: A note to application developers. The existing data layout + * of the above four ptrace requests can change when new registers + * are available for each category in forthcoming processors. + */ + #ifndef __ASSEMBLY__ struct ppc_debug_info { diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index af064d2..e5dfd8e 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -673,6 +673,30 @@ static inline void __switch_to_tm(struct task_struct *prev) } } +void flush_tmreg_to_thread(struct task_struct *tsk) +{ + /* +* If task is not current, it should have been flushed +* already to it's thread_struct during __switch_to(). +*/ + if (tsk != current) + return; + + preempt_disable(); + if (tsk->thread.regs) { + /* +* If we are still current, the TM state need to +* be flushed to thread_struct as it will be still +* present in the current cpu +*/ + if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) { + __switch_to_tm(tsk); + tm_recheckpoint_new_task(tsk); + } + } + preempt_enable(); +} + /* * This is called if we are on the way out to userspace and the * TIF_RESTORE_TM flag is set. It checks if we need to reload diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/
Fwd: [V6 00/11] perf: New conditional branch filter
Hello Peter/Ingo, Would you please consider reviewing the first four patches in this patch series which changes the generic perf kernel and perf tools code. Andi Kleen and Stephane Eranian have already reviewed these changes. The rest of the patch series is related to powerpc and being reviewed by Michael Ellerman/Ben. Regards Anshuman Original Message Subject: [V6 00/11] perf: New conditional branch filter Date: Mon, 5 May 2014 14:39:02 +0530 From: Anshuman Khandual To: linuxppc-...@ozlabs.org, linux-kernel@vger.kernel.org CC: mi...@neuling.org, a...@linux.intel.com, eran...@google.com, mich...@ellerman.id.au, a...@ghostprotocols.net, suka...@linux.vnet.ibm.com, mi...@kernel.org This patchset is the re-spin of the original branch stack sampling patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset also enables SW based branch filtering support for book3s powerpc platforms which have PMU HW backed branch stack sampling support. Summary of code changes in this patchset: (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter (2) Add the "cond" branch filter options in the "perf record" tool (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform (5) Update the documentation regarding "perf record" tool (6) Add some new powerpc instruction analysis functions in code-patching library (7) Enable SW based branch filter support for powerpc book3s (8) Changed BHRB configuration in POWER8 to accommodate SW branch filters With this new SW enablement, the branch filter support for book3s platforms have been extended to include all these combinations discussed below with a sample test application program (included here). Changes in V2 = (1) Enabled PPC64 SW branch filtering support (2) Incorporated changes required for all previous comments Changes in V3 = (1) Split the SW branch filter enablement into multiple patches (2) Added PMU neutral SW branch filtering code, PMU specific HW branch filtering code (3) Added new instruction analysis functionality into powerpc code-patching library (4) Changed name for some of the functions (5) Fixed couple of spelling mistakes (6) Changed code documentation in multiple places Changes in V4 = (1) Changed the commit message for patch (01/10) (2) Changed the patch (02/10) to accommodate review comments from Michael Ellerman (3) Rebased the patchset against latest Linus's tree Changes in V5 = (1) Added a precursor patch to cleanup the indentation problem in power_pmu_bhrb_read (2) Added a precursor patch to re-arrange P8 PMU BHRB filter config which improved the clarity (3) Merged the previous 10th patch into the 8th patch (4) Moved SW based branch analysis code from core perf into code-patching library as suggested by Michael (5) Simplified the logic in branch analysis library (6) Fixed some ambiguities in documentation at various places (7) Added some more in-code documentation blocks at various places (8) Renamed some local variable and function names (9) Fixed some indentation and white space errors in the code (10) Implemented almost all the review comments and suggestions made by Michael Ellerman on V4 patchset (11) Enabled privilege mode SW branch filter (12) Simplified and generalized the SW implemented conditional branch filter (13) PERF_SAMPLE_BRANCH_COND filter is now supported only through SW implementation (14) Adjusted other patches to deal with the above changes Changes in V6 = (1) Rebased the patchset against the master (2) Added "Reviewed-by: Andi Kleen" in the first four patches in the series which changes the generic or X86 perf code. [https://lkml.org/lkml/2014/4/7/130] HW implemented branch filters = (1) perf record -j any_call -e branch-misses:u ./cprog # Overhead Command Source Shared ObjectSource Symbol Target Shared Object Target Symbol # ... ... # 7.85%cprog cprog [.] sw_3_1 cprog [.] success_3_1_2 5.66%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_2 5.65%cprog cprog [.] hw_1_1 cprog [.] symbol1 5.42%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_3 5.40%cprog cprog [.] callme cprog [.] hw_1_1 5.40%cprog cprog [.] sw_3_1 cprog [.] success_3_1_1 5.40%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_1 5.39%cprog cprog
[V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which will extend the existing perf ABI. Various architectures can provide this functionality with either with HW filtering support (if present) or with SW filtering of captured branch instructions. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen --- include/uapi/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 853bc1c..696f69b4 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -163,8 +163,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */ PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */ + PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ - PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V6 04/11] perf, documentation: Description for conditional branch filter
Adding documentation support for conditional branch filter. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen --- tools/perf/Documentation/perf-record.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index c71b0f3..d460049 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -184,9 +184,10 @@ following filters are defined: - in_tx: only when the target is in a hardware transaction - no_tx: only when the target is not in a hardware transaction - abort_tx: only when the target is a hardware transaction abort + - cond: conditional branches + -The option requires at least one branch type among any, any_call, any_ret, ind_call. +The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. The privilege levels may be omitted, in which case, the privilege levels of the associated event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege levels are subject to permissions. When sampling on multiple events, branch stack sampling -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V6 06/11] powerpc, perf: Re-arrange PMU based branch filter processing in POWER8
This patch does some code re-arrangements to make it clear that it ignores any separate privilege level branch filter request and does not support any combinations of HW PMU branch filters. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/power8-pmu.c | 21 +++-- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index fe2763b..13f47f5 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -635,8 +635,6 @@ static int power8_generic_events[] = { static u64 power8_bhrb_filter_map(u64 branch_sample_type) { - u64 pmu_bhrb_filter = 0; - /* BHRB and regular PMU events share the same privilege state * filter configuration. BHRB is always recorded along with a * regular PMU event. As the privilege state filter is handled @@ -644,20 +642,15 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type) * PMU event, we ignore any separate BHRB specific request. */ - /* No branch filter requested */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) - return pmu_bhrb_filter; - - /* Invalid branch filter options - HW does not support */ - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) - return -1; + /* Ignore user, kernel, hv bits */ + branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL; - if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) - return -1; + /* No branch filter requested */ + if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY) + return 0; - if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { - pmu_bhrb_filter |= POWER8_MMCRA_IFM1; - return pmu_bhrb_filter; + if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) { + return POWER8_MMCRA_IFM1; } /* Every thing else is unsupported */ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V6 00/11] perf: New conditional branch filter
ymbol1 2.47%cprog cprog [.] sw_3_1_1cprog [.] sw_3_1 2.47%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_1 2.47%cprog cprog [.] callme cprog [.] hw_1_1 2.47%cprog cprog [.] callme cprog [.] sw_3_1 2.47%cprog cprog [.] hw_1_2 cprog [.] symbol2 2.47%cprog cprog [.] hw_2_1 cprog [.] address1 2.47%cprog cprog [.] back1 cprog [.] callme 2.47%cprog cprog [.] sw_3_1_3cprog [.] sw_3_1 2.47%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_3 2.47%cprog cprog [.] sw_3_1 cprog [.] callme 2.47%cprog cprog [.] callme cprog [.] hw_1_2 2.47%cprog cprog [.] callme cprog [.] sw_4_2 2.46%cprog cprog [.] sw_3_1_2cprog [.] sw_3_1 2.46%cprog cprog [.] sw_3_1 cprog [.] sw_3_1_2 1.57%cprog cprog [.] success_3_1_2 cprog [.] sw_3_1 1.57%cprog cprog [.] sw_3_1 cprog [.] success_3_1_2 1.57%cprog cprog [.] hw_1_1 cprog [.] callme 1.56%cprog cprog [.] hw_2_2 cprog [.] address2 1.56%cprog cprog [.] back2 cprog [.] callme 1.56%cprog cprog [.] sw_3_2 cprog [.] callme 1.56%cprog cprog [.] callme cprog [.] sw_3_2 1.41%cprog cprog [.] success_3_1_1 cprog [.] sw_3_1 1.41%cprog cprog [.] sw_3_1 cprog [.] success_3_1_1 1.40%cprog cprog [.] sw_4_1 cprog [.] callme 1.39%cprog cprog [.] hw_1_2 cprog [.] callme 1.39%cprog cprog [.] sw_3_1 cprog [.] success_3_1_3 1.39%cprog cprog [.] callme cprog [.] main 0.14%cprog [unknown] [.] 0xf7d72328 [unknown] [.] 0xf7d72320 0.03%cprog [unknown] [k] cprog [k] callme 0.01%cprog libc-2.11.2.so[.] _IO_doallocbuf libc-2.11.2.so [.] _IO_doallocbuf 0.01%cprog libc-2.11.2.so[.] printf cprog [.] main 0.01%cprog libc-2.11.2.so[.] _IO_doallocbuf libc-2.11.2.so [.] _IO_file_doallocate 0.01%cprog ld-2.11.2.so [.] malloc [unknown] [.] 0xf7d8b380 0.01%cprog cprog [.] main[unknown] [.] 0x0fe7f63c 0.01%cprog [unknown] [.] 0xf7d8b388 ld-2.11.2.so [.] __libc_memalign 0.01%cprog [unknown] [.] ld-2.11.2.so [.] malloc Please refer to the V4 version of the patchset to learn about the sample test case and it's makefile. Anshuman Khandual (11): perf: Add PERF_SAMPLE_BRANCH_COND perf, tool: Conditional branch filter 'cond' added to perf record x86, perf: Add conditional branch filtering support perf, documentation: Description for conditional branch filter powerpc, perf: Re-arrange BHRB processing powerpc, perf: Re-arrange PMU based branch filter processing in POWER8 powerpc, perf: Change the name of HW PMU branch filter tracking variable powerpc, lib: Add new branch analysis support functions powerpc, perf: Enable SW filtering in branch stack sampling framework power8, perf: Adapt BHRB PMU configuration to work with SW filters powerpc, perf: Enable privilege mode SW branch filters arch/powerpc/include/asm/code-patching.h | 16 ++ arch/powerpc/include/asm/perf_event_server.h | 6 +- arch/powerpc/lib/code-patching.c | 80 +++ arch/powerpc/perf/core-book3s.c
[V6 11/11] powerpc, perf: Enable privilege mode SW branch filters
This patch enables privilege mode SW branch filters. Also modifies POWER8 PMU branch filter configuration so that the privilege mode branch filter implemented as part of base PMU event configuration is reflected in bhrb filter mask. As a result, the SW will skip and not try to process the privilege mode branch filters itself. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 53 +++-- arch/powerpc/perf/power8-pmu.c | 13 -- 2 files changed, 52 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index a94cc43..297cddb 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -26,6 +26,9 @@ #define BHRB_PREDICTION0x0001 #define BHRB_EA0xFFFCUL +#define POWER_ADDR_USER0 +#define POWER_ADDR_KERNEL 1 + struct cpu_hw_events { int n_events; int n_percpu; @@ -450,10 +453,10 @@ static bool check_instruction(unsigned int *addr, u64 sw_filter) * Access the instruction contained in the address and check * whether it complies with the applicable SW branch filters. */ -static bool keep_branch(u64 from, u64 sw_filter) +static bool keep_branch(u64 from, u64 to, u64 sw_filter) { unsigned int instr; - bool ret; + bool to_plm, ret, flag; /* * The "from" branch for every branch record has to go @@ -463,6 +466,37 @@ static bool keep_branch(u64 from, u64 sw_filter) if (sw_filter == 0) return true; + to_plm = is_kernel_addr(to) ? POWER_ADDR_KERNEL : POWER_ADDR_USER; + + /* +* Applying privilege mode SW branch filters first on the +* 'to' address makes an AND semantic with the SW generic +* branch filters (OR with each other) being applied on the +* from address there after. +*/ + + /* Ignore PERF_SAMPLE_BRANCH_HV */ + sw_filter &= ~PERF_SAMPLE_BRANCH_HV; + + /* Privilege mode branch filters for "TO" address */ + if (sw_filter & PERF_SAMPLE_BRANCH_PLM_ALL) { + flag = false; + + if (sw_filter & PERF_SAMPLE_BRANCH_USER) { + if(to_plm == POWER_ADDR_USER) + flag = true; + } + + if (sw_filter & PERF_SAMPLE_BRANCH_KERNEL) { + if(to_plm == POWER_ADDR_KERNEL) + flag = true; + } + + if (!flag) + return false; + } + + /* Generic branch filters for "FROM" address */ if (is_kernel_addr(from)) { return check_instruction((unsigned int *) from, sw_filter); } else { @@ -501,15 +535,6 @@ static int all_filters_covered(u64 branch_sample_type, u64 bhrb_filter) if (!(branch_sample_type & x)) continue; /* -* Privilege filter requests have been already -* taken care during the base PMU configuration. -*/ - if ((x == PERF_SAMPLE_BRANCH_USER) - || (x == PERF_SAMPLE_BRANCH_KERNEL) - || (x == PERF_SAMPLE_BRANCH_HV)) - continue; - - /* * Requested filter not available either * in PMU or in SW. */ @@ -520,7 +545,10 @@ static int all_filters_covered(u64 branch_sample_type, u64 bhrb_filter) } /* SW implemented branch filters */ -static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_ANY_CALL, +static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_USER, + PERF_SAMPLE_BRANCH_KERNEL, + PERF_SAMPLE_BRANCH_HV, + PERF_SAMPLE_BRANCH_ANY_CALL, PERF_SAMPLE_BRANCH_COND, PERF_SAMPLE_BRANCH_ANY_RETURN, PERF_SAMPLE_BRANCH_IND_CALL }; @@ -624,6 +652,7 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) /* Apply SW branch filters and drop the entry if required */ if (!keep_branch(cpuhw->bhrb_entries[u_index].from, + cpuhw->bhrb_entries[u_index].to, cpuhw->bhrb_sw_filter)) u_index--; u_index++; diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 4743bde..b6e21da 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -649,9 +649,19 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
[V6 05/11] powerpc, perf: Re-arrange BHRB processing
This patch cleans up some existing indentation problem and re-organizes the BHRB processing code with an helper function named `update_branch_entry` making it more readable. This patch does not change any functionality. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 102 1 file changed, 52 insertions(+), 50 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 4520c93..66bea54 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -402,11 +402,21 @@ static __u64 power_pmu_bhrb_to(u64 addr) return target - (unsigned long)&instr + addr; } +/* Update individual branch entry */ +void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, u64 to, int pred) +{ + cpuhw->bhrb_entries[u_index].from = from; + cpuhw->bhrb_entries[u_index].to = to; + cpuhw->bhrb_entries[u_index].mispred = pred; + cpuhw->bhrb_entries[u_index].predicted = ~pred; + return; +} + /* Processing BHRB entries */ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) { u64 val; - u64 addr; + u64 addr, tmp; int r_index, u_index, pred; r_index = 0; @@ -417,62 +427,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw) if (!val) /* Terminal marker: End of valid BHRB entries */ break; - else { - addr = val & BHRB_EA; - pred = val & BHRB_PREDICTION; - if (!addr) - /* invalid entry */ - continue; + addr = val & BHRB_EA; + pred = val & BHRB_PREDICTION; - /* Branches are read most recent first (ie. mfbhrb 0 is -* the most recent branch). -* There are two types of valid entries: -* 1) a target entry which is the to address of a -*computed goto like a blr,bctr,btar. The next -*entry read from the bhrb will be branch -*corresponding to this target (ie. the actual -*blr/bctr/btar instruction). -* 2) a from address which is an actual branch. If a -*target entry proceeds this, then this is the -*matching branch for that target. If this is not -*following a target entry, then this is a branch -*where the target is given as an immediate field -*in the instruction (ie. an i or b form branch). -*In this case we need to read the instruction from -*memory to determine the target/to address. + if (!addr) + /* invalid entry */ + continue; + + /* Branches are read most recent first (ie. mfbhrb 0 is +* the most recent branch). +* There are two types of valid entries: +* 1) a target entry which is the to address of a +*computed goto like a blr,bctr,btar. The next +*entry read from the bhrb will be branch +*corresponding to this target (ie. the actual +*blr/bctr/btar instruction). +* 2) a from address which is an actual branch. If a +*target entry proceeds this, then this is the +*matching branch for that target. If this is not +*following a target entry, then this is a branch +*where the target is given as an immediate field +*in the instruction (ie. an i or b form branch). +*In this case we need to read the instruction from +*memory to determine the target/to address. +*/ + if (val & BHRB_TARGET) { + /* Target branches use two entries +* (ie. computed gotos/XL form) */ + tmp = addr; + /* Get from address in next entry */ + val = read_bhrb(r_index++); + addr = val & BHRB_EA; if (val & BHRB_TARGET) { - /* Target branches use two entries -* (ie. computed gotos/XL form) -*/ - cpuhw->bhrb_entries[u_index].to = addr; - cpuhw->bhrb_entries[u_index].mispred = pred; -
[V6 07/11] powerpc, perf: Change the name of HW PMU branch filter tracking variable
This patch simply changes the name of the variable from 'bhrb_filter' to 'bhrb_hw_filter' in order to add one more variable which will track SW filters in generic powerpc book3s code which will be implemented in the subsequent patch. This patch does not change any functionality. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/core-book3s.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 66bea54..1d7e909 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -47,7 +47,7 @@ struct cpu_hw_events { int n_txn_start; /* BHRB bits */ - u64 bhrb_filter;/* BHRB HW branch filter */ + u64 bhrb_hw_filter; /* BHRB HW branch filter */ int bhrb_users; void*bhrb_context; struct perf_branch_stack bhrb_stack; @@ -1298,7 +1298,7 @@ static void power_pmu_enable(struct pmu *pmu) mb(); if (cpuhw->bhrb_users) - ppmu->config_bhrb(cpuhw->bhrb_filter); + ppmu->config_bhrb(cpuhw->bhrb_hw_filter); write_mmcr0(cpuhw, mmcr0); @@ -1405,7 +1405,7 @@ nocheck: out: if (has_branch_stack(event)) { power_pmu_bhrb_enable(event); - cpuhw->bhrb_filter = ppmu->bhrb_filter_map( + cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type); } @@ -1788,10 +1788,10 @@ static int power_pmu_event_init(struct perf_event *event) err = power_check_constraints(cpuhw, events, cflags, n + 1); if (has_branch_stack(event)) { - cpuhw->bhrb_filter = ppmu->bhrb_filter_map( + cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type); - if(cpuhw->bhrb_filter == -1) + if(cpuhw->bhrb_hw_filter == -1) return -EOPNOTSUPP; } -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V6 08/11] powerpc, lib: Add new branch analysis support functions
Generic powerpc branch analysis support added in the code patching library which will help the subsequent patch on SW based filtering of branch records in perf. Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/code-patching.h | 16 +++ arch/powerpc/lib/code-patching.c | 80 2 files changed, 96 insertions(+) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 97e02f9..39919d4 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -22,6 +22,16 @@ #define BRANCH_SET_LINK0x1 #define BRANCH_ABSOLUTE0x2 +#define XL_FORM_LR 0x4C20 +#define XL_FORM_CTR 0x4C000420 +#define XL_FORM_TAR 0x4C000460 + +#define BO_ALWAYS0x0280 +#define BO_CTR 0x0200 +#define BO_CRBI_OFF 0x0080 +#define BO_CRBI_ON 0x0180 +#define BO_CRBI_HINT 0x0040 + unsigned int create_branch(const unsigned int *addr, unsigned long target, int flags); unsigned int create_cond_branch(const unsigned int *addr, @@ -56,4 +66,10 @@ static inline unsigned long ppc_function_entry(void *func) #endif } +/* Perf branch filters */ +bool instr_is_return_branch(unsigned int instr); +bool instr_is_conditional_branch(unsigned int instr); +bool instr_is_func_call(unsigned int instr); +bool instr_is_indirect_func_call(unsigned int instr); + #endif /* _ASM_POWERPC_CODE_PATCHING_H */ diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index d5edbeb..a06f8b3 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -77,6 +77,7 @@ static unsigned int branch_opcode(unsigned int instr) return (instr >> 26) & 0x3F; } +/* Forms of branch instruction */ static int instr_is_branch_iform(unsigned int instr) { return branch_opcode(instr) == 18; @@ -87,6 +88,85 @@ static int instr_is_branch_bform(unsigned int instr) return branch_opcode(instr) == 16; } +static int instr_is_branch_xlform(unsigned int instr) +{ + return branch_opcode(instr) == 19; +} + +/* Classification of XL-form instruction */ +static int is_xlform_lr(unsigned int instr) +{ + return (instr & XL_FORM_LR) == XL_FORM_LR; +} + +/* BO field analysis (B-form or XL-form) */ +static int is_bo_always(unsigned int instr) +{ + return (instr & BO_ALWAYS) == BO_ALWAYS; +} + +/* Link bit is set */ +static int is_branch_link_set(unsigned int instr) +{ + return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK; +} + +/* + * Generic software implemented branch filters used + * by perf branch stack sampling when PMU does not + * process them for some reason. + */ + +/* PERF_SAMPLE_BRANCH_ANY_RETURN */ +bool instr_is_return_branch(unsigned int instr) +{ + /* +* Conditional and unconditional branch to LR register +* without seting the link register. +*/ + if (is_xlform_lr(instr) && !is_branch_link_set(instr)) + return true; + + return false; +} + +/* PERF_SAMPLE_BRANCH_COND */ +bool instr_is_conditional_branch(unsigned int instr) +{ + /* I-form instruction - excluded */ + if (instr_is_branch_iform(instr)) + return false; + + /* B-form or XL-form instruction */ + if (instr_is_branch_bform(instr) || instr_is_branch_xlform(instr)) { + + /* Not branch always */ + if (!is_bo_always(instr)) + return true; + } + return false; +} + +/* PERF_SAMPLE_BRANCH_ANY_CALL */ +bool instr_is_func_call(unsigned int instr) +{ + /* LR should be set */ + if (is_branch_link_set(instr)) + return true; + + return false; +} + +/* PERF_SAMPLE_BRANCH_IND_CALL */ +bool instr_is_indirect_func_call(unsigned int instr) +{ + /* XL-form instruction with LR set */ + if (instr_is_branch_xlform(instr) && is_branch_link_set(instr)) + return true; + + return false; +} + int instr_is_relative_branch(unsigned int instr) { if (instr & BRANCH_ABSOLUTE) -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V6 03/11] x86, perf: Add conditional branch filtering support
This patch adds conditional branch filtering support, enabling it for PERF_SAMPLE_BRANCH_COND in perf branch stack sampling framework by utilizing an available software filter X86_BR_JCC. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen --- arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c index d82d155..9dd2459 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c @@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event) if (br_type & PERF_SAMPLE_BRANCH_NO_TX) mask |= X86_BR_NO_TX; + if (br_type & PERF_SAMPLE_BRANCH_COND) + mask |= X86_BR_JCC; + /* * stash actual user request into reg, it may * be used by fixup code for some CPU @@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { * NHM/WSM erratum: must include IND_JMP to capture IND_CALL */ [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { @@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = { [PERF_SAMPLE_BRANCH_ANY_CALL] = LBR_REL_CALL | LBR_IND_CALL | LBR_FAR, [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL, + [PERF_SAMPLE_BRANCH_COND] = LBR_JCC, }; /* core */ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V6 09/11] powerpc, perf: Enable SW filtering in branch stack sampling framework
This patch enables SW based post processing of BHRB captured branches to be able to meet more user defined branch filtration criteria in perf branch stack sampling framework. These changes increase the number of branch filters and their valid combinations on any powerpc64 server platform with BHRB support. Find the summary of code changes here. (1) struct cpu_hw_events Introduced two new variables track various filter values and mask (a) bhrb_sw_filter Tracks SW implemented branch filter flags (b) bhrb_filter Tracks both (SW and HW) branch filter flags (2) Event creation Kernel will figure out supported BHRB branch filters through a PMU call back 'bhrb_filter_map'. This function will find out how many of the requested branch filters can be supported in the PMU HW. It will not try to invalidate any branch filter combinations. Event creation will not error out because of lack of HW based branch filters. Meanwhile it will track the overall supported branch filters in the 'bhrb_filter' variable. Once the PMU call back returns kernel will process the user branch filter request against available SW filters (bhrb_sw_filter_map) while looking at the 'bhrb_filter'. During this phase all the branch filters which are still pending from the user requested list will have to be supported in SW failing which the event creation will error out. (3) SW branch filter During the BHRB data capture inside the PMU interrupt context, each of the captured 'perf_branch_entry.from' will be checked for compliance with applicable SW branch filters. If the entry does not conform to the filter requirements, it will be discarded from the final perf branch stack buffer. (4) Supported SW based branch filters (a) PERF_SAMPLE_BRANCH_ANY_RETURN (b) PERF_SAMPLE_BRANCH_IND_CALL (c) PERF_SAMPLE_BRANCH_ANY_CALL (d) PERF_SAMPLE_BRANCH_COND Please refer the patch to understand the classification of instructions into these branch filter categories. (5) Multiple branch filter semantics Book3 sever implementation follows the same OR semantics (as implemented in x86) while dealing with multiple branch filters at any point of time. SW branch filter analysis is carried on the data set captured in the PMU HW. So the resulting set of data (after applying the SW filters) will inherently be an AND with the HW captured set. Hence any combination of HW and SW branch filters will be invalid. HW based branch filters are more efficient and faster compared to SW implemented branch filters. So at first the PMU should decide whether it can support all the requested branch filters itself or not. In case it can support all the branch filters in an OR manner, we dont apply any SW branch filter on top of the HW captured set (which is the final set). This preserves the OR semantic of multiple branch filters as required. But in case where the PMU cannot support all the requested branch filters in an OR manner, it should not apply any it's filters and leave it upto the SW to handle them all. Its the PMU code's responsibility to uphold this protocol to be able to conform to the overall OR semantic of perf branch stack sampling framework. Signed-off-by: Anshuman Khandual --- arch/powerpc/include/asm/perf_event_server.h | 6 +- arch/powerpc/perf/core-book3s.c | 188 ++- arch/powerpc/perf/power8-pmu.c | 2 +- 3 files changed, 187 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index 9ed73714..93a9a8a 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -19,6 +19,10 @@ #define MAX_EVENT_ALTERNATIVES 8 #define MAX_LIMITED_HWCOUNTERS 2 +#define for_each_branch_sample_type(x) \ +for ((x) = PERF_SAMPLE_BRANCH_USER; \ + (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1) + /* * This struct provides the constants and functions needed to * describe the PMU on a particular POWER-family CPU. @@ -35,7 +39,7 @@ struct power_pmu { unsigned long *valp); int (*get_alternatives)(u64 event_id, unsigned int flags, u64 alt[]); - u64 (*bhrb_filter_map)(u64 branch_sample_type); + u64 (*bhrb_filter_map)(u64 branch_sample_type, u64 *bhrb_filter); void(*config_bhrb)(u64 pmu_bhrb_filter); void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]); int (*limited_pmc_event)(u64 event_id);
[V6 10/11] power8, perf: Adapt BHRB PMU configuration to work with SW filters
Powerpc kernel now supports SW based branch filters for book3s systems with some specifc requirements while dealing with HW supported branch filters in order to achieve overall OR semantics prevailing in perf branch stack sampling framework. This patch adapts the BHRB branch filter configuration to meet those protocols. POWER8 PMU can only handle one HW based branch filter request at any point of time. For all other combinations PMU will pass it on to the SW. Signed-off-by: Anshuman Khandual --- arch/powerpc/perf/power8-pmu.c | 50 -- 1 file changed, 43 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 699b1dd..4743bde 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -635,6 +635,16 @@ static int power8_generic_events[] = { static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter) { + u64 x, pmu_bhrb_filter; + pmu_bhrb_filter = 0; + *bhrb_filter = 0; + + /* No branch filter requested */ + if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) { + *bhrb_filter = PERF_SAMPLE_BRANCH_ANY; + return pmu_bhrb_filter; + } + /* BHRB and regular PMU events share the same privilege state * filter configuration. BHRB is always recorded along with a * regular PMU event. As the privilege state filter is handled @@ -645,16 +655,42 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter) /* Ignore user, kernel, hv bits */ branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL; - /* No branch filter requested */ - if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY) - return 0; + /* +* P8 does not support oring of PMU HW branch filters. Hence +* if multiple branch filters are requested which includes filters +* supported in PMU, still go ahead and clear the PMU based HW branch +* filter component as in this case all the filters will be processed +* in SW. +*/ - if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) { - return POWER8_MMCRA_IFM1; + for_each_branch_sample_type(x) { + /* Ignore privilege branch filters */ + if ((x == PERF_SAMPLE_BRANCH_USER) + || (x == PERF_SAMPLE_BRANCH_KERNEL) + || (x == PERF_SAMPLE_BRANCH_HV)) + continue; + + if (!(branch_sample_type & x)) + continue; + + /* Supported individual PMU branch filters */ + if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) { + branch_sample_type &= ~PERF_SAMPLE_BRANCH_ANY_CALL; + if (branch_sample_type) { + /* Multiple branch filters will be processed in SW */ + pmu_bhrb_filter = 0; + *bhrb_filter = 0; + return pmu_bhrb_filter; + } else { + /* Individual branch filter will be processed in PMU */ + pmu_bhrb_filter |= POWER8_MMCRA_IFM1; + *bhrb_filter|= PERF_SAMPLE_BRANCH_ANY_CALL; + return pmu_bhrb_filter; + } + } } - /* Every thing else is unsupported */ - return -1; + return pmu_bhrb_filter; } static void power8_config_bhrb(u64 pmu_bhrb_filter) -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[V6 02/11] perf, tool: Conditional branch filter 'cond' added to perf record
Adding perf record support for new branch stack filter criteria PERF_SAMPLE_BRANCH_COND. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen --- tools/perf/builtin-record.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 8ce62ef..dfe6b9d 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX), BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX), BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX), + BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND), BRANCH_END }; -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: [V6 00/11] perf: New conditional branch filter
On 05/21/2014 02:53 PM, Peter Zijlstra wrote: > On Wed, May 21, 2014 at 02:41:58PM +0530, Anshuman Khandual wrote: >> Hello Peter/Ingo, >> >> Would you please consider reviewing the first four patches in this patch >> series >> which changes the generic perf kernel and perf tools code. Andi Kleen and >> Stephane >> Eranian have already reviewed these changes. The rest of the patch series is >> related >> to powerpc and being reviewed by Michael Ellerman/Ben. >> > > If they land in my inbox I might have a look. > Sent. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND
On 05/21/2014 05:00 PM, Peter Zijlstra wrote: > On Wed, May 21, 2014 at 03:29:46PM +0530, Anshuman Khandual wrote: >> This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which >> will extend the existing perf ABI. Various architectures can provide >> this functionality with either with HW filtering support (if present) >> or with SW filtering of captured branch instructions. > > The Changelog fails to mention what _this_ functionality is. > Peter, Hope this new change log below makes more sense. --- commit af75191bb7ad36cba7d75c2741c93dfbdaf09da3 Author: Anshuman Khandual Date: Mon Jul 22 12:22:27 2013 +0530 perf: Add new conditional branch filter PERF_SAMPLE_BRANCH_COND This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which will extend the existing perf ABI. This will filter branches which are conditional. Various architectures can provide this functionality either with HW filtering support (if present) or with SW filtering of captured branch instructions. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 853bc1c..696f69b4 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -163,8 +163,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */ PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */ PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */ + PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */ - PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */ + PERF_SAMPLE_BRANCH_MAX = 1U << 11, /* non-ABI */ }; #define PERF_SAMPLE_BRANCH_PLM_ALL \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: [V6 00/11] perf: New conditional branch filter
On 05/21/2014 05:31 PM, Peter Zijlstra wrote: > On Wed, May 21, 2014 at 04:09:55PM +0530, Anshuman Khandual wrote: >> On 05/21/2014 02:53 PM, Peter Zijlstra wrote: >>> On Wed, May 21, 2014 at 02:41:58PM +0530, Anshuman Khandual wrote: >>>> Hello Peter/Ingo, >>>> >>>> Would you please consider reviewing the first four patches in this patch >>>> series >>>> which changes the generic perf kernel and perf tools code. Andi Kleen and >>>> Stephane >>>> Eranian have already reviewed these changes. The rest of the patch series >>>> is related >>>> to powerpc and being reviewed by Michael Ellerman/Ben. >>>> >>> >>> If they land in my inbox I might have a look. >>> >> >> Sent. > > Thanks, they look fine to me, although 1/x can use a lightly longer > changelog, making it explicit its a filter for conditional branches. > > How do people want this routed? Should I take all patches through tip, > or do I ask Ingo to create a special perf/cond branch which includes the > first 4 patches which can be merged into whatever ppc branch and the > rest then go on top in the ppc tree? > Peter, Thanks for considering the patchset. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets
On 05/20/2014 04:03 PM, Pedro Alves wrote: > On 05/20/2014 09:14 AM, Anshuman Khandual wrote: >> On 05/19/2014 08:13 PM, Pedro Alves wrote: >>> On 05/19/2014 12:46 PM, Anshuman Khandual wrote: >>> >>>>>> I couldn't actually find any arch that currently returns -ENODEV in >>>>>> the "active" hook. I see that binfmt_elf.c doesn't handle >>>>>> regset->active() returning < 0. Guess that may be why. Looks like >>>>>> something that could be cleaned up, to me. >>>>>> >>>> Also it does not consider the return value of regset->active(t->task, >>>> regset) >>>> (whose objective is to figure out whether we need to request regset->n >>>> number >>>> of elements or less than that) in the subsequent call to regset->get >>>> function. >>> >>> Indeed. >>> >>> TBC, do you plan on fixing this? Otherwise ... >> >> Sure, thinking something like this as mentioned below. But still not sure >> how to use >> the return type of -ENODEV from the function regset->active(). Right now if >> any >> regset does have the active hook and it returns anything but positive value, >> it will >> be ignored and the control moves to the next regset in view. This prevents >> the thread >> core note type being written to the core dump. > > Looks to me that that's exactly what should happen for -ENODEV too. The > regset > should be ignored. If regset->active() returns -ENODEV, then the machine > doesn't have the registers at all, so what makes sense to me is to not write > the > corresponding core note in the dump. IOW, on such a machine, the kernel > generates a core exactly like if the support for these registers that don't > make sense for this machine wasn't compiled in at all. And generates a core > exactly like an older kernel that didn't know about that regset > (which is fine for that same machine) yet. > All of this happen right now even without specifically checking for the return type of -ENODEV and just checking for a positive value. I guess thats the reason they had omitted -ENODEV in the first place. >> >> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c >> index aa3cb62..80672fb 100644 >> --- a/fs/binfmt_elf.c >> +++ b/fs/binfmt_elf.c >> @@ -1553,7 +1553,15 @@ static int fill_thread_core_info(struct >> elf_thread_core_info *t, >> if (regset->core_note_type && regset->get && >> (!regset->active || regset->active(t->task, regset))) { >> int ret; > > So, here, this ? > > (!regset->active || regset->active(t->task, regset) > 0)) > { > > >> - size_t size = regset->n * regset->size; >> + size_t size; >> + >> + /* Request only the active elements in the regset */ >> + if (!regset->active) >> + size = regset->n * regset->size; >> + else >> + size = regset->active(t->task, regset) >> + * >> regset->size; >> + > > > I wonder if it wouldn't be cleaner to add a function like: > > int > regset_active (tast *task, regseg *regset) > { >if (!regset->active) > return regset->n * regset->size; >else > return regset->active(task, regset); > } > > And then use it like > >if (regset->core_note_type && regset->get) { >int size = regset_active (t->task, regset); > >if (size > 0) { > ... >} > Yeah this makes sense. > Though at this point, we don't actually make use of > the distinction between -ENODEV vs 0. Guess that's what > we should be thinking about. Seems like there some details that > need to be sorted out, and some verification that consumers aren't > broken by outputting smaller notes -- e.g., ia64 makes me > wonder that. I agree. > > Maybe we should leave this for another day, and have tm_spr_active > return 0 instead of -ENODEV when the machine doesn't have the hardware, > or not install that hook at all. Seems like the effect will be the same, > as the note isn't output if ->get fails. Agree. Active
Re: [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND
On 05/22/2014 12:31 PM, Peter Zijlstra wrote: > On Thu, May 22, 2014 at 09:18:54AM +0530, Anshuman Khandual wrote: > >> Hope this new change log below makes more sense. > > Yep reads a whole lot better. Thanks. > Will resend the first four patches with this new commit messages without changing the overall version of the patchset. Hope thats okay. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V6 2/4] perf, tool: Conditional branch filter 'cond' added to perf record
Adding perf record support for new branch stack filter criteria PERF_SAMPLE_BRANCH_COND. Signed-off-by: Anshuman Khandual Reviewed-by: Stephane Eranian Reviewed-by: Andi Kleen --- tools/perf/builtin-record.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 8ce62ef..dfe6b9d 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX), BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX), BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX), + BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND), BRANCH_END }; -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/