from:"Anshuman Khandual"

[RFC V2 0/3] mm/hotplug: Pre-validate the address range with platform

2020-11-29 Thread Anshuman Khandual

This series adds a mechanism allowing platforms to weigh in and prevalidate
incoming address range before proceeding further with the memory hotplug.
This helps prevent potential platform errors for the given address range,
down the hotplug call chain, which inevitably fails the hotplug itself.

This mechanism was suggested by David Hildenbrand during another discussion
with respect to a memory hotplug fix on arm64 platform.

https://lore.kernel.org/linux-arm-kernel/1600332402-30123-1-git-send-email-anshuman.khand...@arm.com/

This mechanism focuses on the addressibility aspect and not [sub] section
alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span()
have been left unchanged. Wondering if all these can still be unified in
an expanded memhp_range_allowed() check, that can be called from multiple
memory hot add and remove paths.

This series applies on v5.10-rc6 and has been slightly tested on arm64.
But looking for some early feedback here.

Changes in RFC V2:

Incorporated all review feedbacks from David.

- Added additional range check in __segment_load() on s390 which was lost
- Changed is_private init in pagemap_range()
- Moved the framework into mm/memory_hotplug.c
- Made arch_get_addressable_range() a __weak function
- Renamed arch_get_addressable_range() as arch_get_mappable_range()
- Callback arch_get_mappable_range() only handles range requiring linear mapping
- Merged multiple memhp_range_allowed() checks in register_memory_resource()
- Replaced WARN() with pr_warn() in memhp_range_allowed()
- Replaced error return code ERANGE with E2BIG

Changes in RFC V1:

https://lore.kernel.org/linux-mm/1606098529-7907-1-git-send-email-anshuman.khand...@arm.com/

Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (3):
  mm/hotplug: Prevalidate the address range being added with platform
  arm64/mm: Define arch_get_mappable_range()
  s390/mm: Define arch_get_mappable_range()

 arch/arm64/mm/mmu.c| 14 +++
 arch/s390/mm/extmem.c  |  5 +++
 arch/s390/mm/vmem.c| 13 --
 include/linux/memory_hotplug.h |  2 +
 mm/memory_hotplug.c| 77 +-
 mm/memremap.c  |  6 ++-
 6 files changed, 84 insertions(+), 33 deletions(-)

-- 
2.20.1

[RFC V2 1/3] mm/hotplug: Prevalidate the address range being added with platform

2020-11-29 Thread Anshuman Khandual

This introduces memhp_range_allowed() which can be called in various memory
hotplug paths to prevalidate the address range which is being added, with
the platform. Then memhp_range_allowed() calls memhp_get_pluggable_range()
which provides applicable address range depending on whether linear mapping
is required or not. For ranges that require linear mapping, it calls a new
arch callback arch_get_mappable_range() which the platform can override. So
the new callback, in turn provides the platform an opportunity to configure
acceptable memory hotplug address ranges in case there are constraints.

This mechanism will help prevent platform specific errors deep down during
hotplug calls. This drops now redundant check_hotplug_memory_addressable()
check in __add_pages().

Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 include/linux/memory_hotplug.h |  2 +
 mm/memory_hotplug.c| 77 +-
 mm/memremap.c  |  6 ++-
 3 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 551093b74596..047a711ab76a 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -70,6 +70,8 @@ typedef int __bitwise mhp_t;
  */
 #define MEMHP_MERGE_RESOURCE   ((__force mhp_t)BIT(0))
 
+bool memhp_range_allowed(u64 start, u64 size, bool need_mapping);
+
 /*
  * Extended parameters for memory hotplug:
  * altmap: alternative allocator for memmap array (optional)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 63b2e46b6555..9dd9db01985d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -107,6 +107,9 @@ static struct resource *register_memory_resource(u64 start, 
u64 size,
if (strcmp(resource_name, "System RAM"))
flags |= IORESOURCE_SYSRAM_DRIVER_MANAGED;
 
+   if (!memhp_range_allowed(start, size, 1))
+   return ERR_PTR(-E2BIG);
+
/*
 * Make sure value parsed from 'mem=' only restricts memory adding
 * while booting, so that memory hotplug won't be impacted. Please
@@ -284,22 +287,6 @@ static int check_pfn_span(unsigned long pfn, unsigned long 
nr_pages,
return 0;
 }
 
-static int check_hotplug_memory_addressable(unsigned long pfn,
-   unsigned long nr_pages)
-{
-   const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
-
-   if (max_addr >> MAX_PHYSMEM_BITS) {
-   const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1;
-   WARN(1,
-"Hotplugged memory exceeds maximum addressable address, 
range=%#llx-%#llx, maximum=%#llx\n",
-(u64)PFN_PHYS(pfn), max_addr, max_allowed);
-   return -E2BIG;
-   }
-
-   return 0;
-}
-
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -317,10 +304,6 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned 
long nr_pages,
if (WARN_ON_ONCE(!params->pgprot.pgprot))
return -EINVAL;
 
-   err = check_hotplug_memory_addressable(pfn, nr_pages);
-   if (err)
-   return err;
-
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
@@ -1824,4 +1807,58 @@ int offline_and_remove_memory(int nid, u64 start, u64 
size)
return rc;
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
+
+/*
+ * Platforms should define arch_get_mappable_range() that provides
+ * maximum possible addressable physical memory range for which the
+ * linear mapping could be created. The platform returned address
+ * range must adhere to these following semantics.
+ *
+ * - range.start <= range.end
+ * - Range includes both end points [range.start..range.end]
+ *
+ * There is also a fallback definition provided here, allowing the
+ * entire possible physical address range in case any platform does
+ * not define arch_get_mappable_range().
+ */
+struct range __weak arch_get_mappable_range(void)
+{
+   struct range memhp_range = {
+   .start = 0UL,
+   .end = -1ULL,
+   };
+   return memhp_range;
+}
+
+static inline struct range memhp_get_pluggable_range(bool need_mapping)
+{
+   const u64 max_phys = (1ULL << (MAX_PHYSMEM_BITS + 1)) - 1;
+   struct range memhp_range;
+
+   if (need_mapping) {
+   memhp_range = arch_get_mappable_range();
+   if (memhp_range.start > max_phys) {
+   memhp_range.start = 0;
+   memhp_range.end = 0;
+   }
+   memhp_range.end = min_t(u64, memhp_range.end, max_phys);
+   } else {
+   memhp_range.start = 0;
+   memhp_range.end = max_phys;
+   }
+

[RFC V2 3/3] s390/mm: Define arch_get_mappable_range()

2020-11-29 Thread Anshuman Khandual

This overrides arch_get_mappabble_range() on s390 platform and drops now
redundant similar check in vmem_add_mapping(). This compensates by adding
a new check __segment_load() to preserve the existing functionality.

Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: David Hildenbrand 
Cc: linux-s...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/s390/mm/extmem.c |  5 +
 arch/s390/mm/vmem.c   | 13 +
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
index 5060956b8e7d..cc055a78f7b6 100644
--- a/arch/s390/mm/extmem.c
+++ b/arch/s390/mm/extmem.c
@@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned 
long *addr, unsigned long
goto out_free_resource;
}
 
+   if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
+   rc = -ERANGE;
+   goto out_resource;
+   }
+
rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
if (rc)
goto out_resource;
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index b239f2ba93b0..06dddcc0ce06 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned 
long size)
mutex_unlock(&vmem_mutex);
 }
 
+struct range arch_get_mappable_range(void)
+{
+   struct range memhp_range;
+
+   memhp_range.start = 0;
+   memhp_range.end =  VMEM_MAX_PHYS;
+   return memhp_range;
+}
+
 int vmem_add_mapping(unsigned long start, unsigned long size)
 {
int ret;
 
-   if (start + size > VMEM_MAX_PHYS ||
-   start + size < start)
-   return -ERANGE;
-
mutex_lock(&vmem_mutex);
ret = vmem_add_range(start, size);
if (ret)
-- 
2.20.1

[RFC V2 2/3] arm64/mm: Define arch_get_mappable_range()

2020-11-29 Thread Anshuman Khandual

This overrides arch_get_mappable_range() on arm64 platform which will be
used with recently added generic framework. It drops inside_linear_region()
and subsequent check in arch_add_memory() which are no longer required.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: David Hildenbrand 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/mm/mmu.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ca692a815731..49ec8f2838f2 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1444,16 +1444,19 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned 
long start, u64 size)
free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
 }
 
-static bool inside_linear_region(u64 start, u64 size)
+struct range arch_get_mappable_range(void)
 {
+   struct range memhp_range;
+
/*
 * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
 * accommodating both its ends but excluding PAGE_END. Max physical
 * range which can be mapped inside this linear mapping range, must
 * also be derived from its end points.
 */
-   return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
-  (start + size - 1) <= __pa(PAGE_END - 1);
+   memhp_range.start = __pa(_PAGE_OFFSET(vabits_actual));
+   memhp_range.end =  __pa(PAGE_END - 1);
+   return memhp_range;
 }
 
 int arch_add_memory(int nid, u64 start, u64 size,
@@ -1461,11 +1464,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
 {
int ret, flags = 0;
 
-   if (!inside_linear_region(start, size)) {
-   pr_err("[%llx %llx] is outside linear mapping region\n", start, 
start + size);
-   return -EINVAL;
-   }
-
if (rodata_full || debug_pagealloc_enabled())
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
-- 
2.20.1

Re: [PATCH 1/2] mm/debug_vm_pgtable/basic: Add validation for dirtiness after write protect

2020-11-29 Thread Anshuman Khandual




On 11/27/20 3:14 PM, Catalin Marinas wrote:
> On Fri, Nov 27, 2020 at 09:22:24AM +0100, Christophe Leroy wrote:
>> Le 27/11/2020 à 06:06, Anshuman Khandual a écrit :
>>> This adds validation tests for dirtiness after write protect conversion for
>>> each page table level. This is important for platforms such as arm64 that
>>> removes the hardware dirty bit while making it an write protected one. This
>>> also fixes pxx_wrprotect() related typos in the documentation file.
>>
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index c05d9dcf7891..a5be11210597 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, 
>>> pgprot_t prot)
>>> WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte;
>>> WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte;
>>> WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte;
>>> +   WARN_ON(pte_dirty(pte_wrprotect(pte)));
>>
>> Wondering what you are testing here exactly.
>>
>> Do you expect that if PTE has the dirty bit, it gets cleared by 
>> pte_wrprotect() ?
>>
>> Powerpc doesn't do that, it only clears the RW bit but the dirty bit remains
>> if it is set, until you call pte_mkclean() explicitely.
> 
> Arm64 has an unusual way of setting a hardware dirty "bit", it actually
> clears the PTE_RDONLY bit. The pte_wrprotect() sets the PTE_RDONLY bit
> back and we can lose the dirty information. Will found this and posted
> patches to fix the arm64 pte_wprotect() to set a software PTE_DIRTY if
> !PTE_RDONLY (we do this for ptep_set_wrprotect() already). My concern
> was that we may inadvertently make a fresh/clean pte dirty with such
> change, hence the suggestion for the test.
> 
> That said, I think we also need a test in the other direction,
> pte_wrprotect() should preserve any dirty information:
> 
>   WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte;

This seems like a generic enough principle which all platforms should
adhere to. But the proposed test WARN_ON(pte_dirty(pte_wrprotect(pte)))
might fail on some platforms if the page table entry came in as a dirty
one and pte_wrprotect() is not expected to alter the dirty state.

Instead, should we just add the following two tests, which would ensure
that pte_wrprotect() never alters the dirty state of a page table entry.

WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte;
WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte;

> 
> If pte_mkwrite() makes a pte truly writable and potentially dirty, we
> could also add a test as below. However, I think that's valid for arm64,
> other architectures with a separate hardware dirty bit would fail this:
> 
>   WARN_ON(!pte_dirty(pte_wrprotect(pte_mkwrite(pte;

Right.

[RFC 0/3] mm/hotplug: Pre-validate the address range with platform

2020-11-22 Thread Anshuman Khandual

This series adds a mechanism allowing platforms to weigh in and prevalidate
incoming address range before proceeding further with the memory hotplug.
This helps prevent potential platform errors for the given address range,
down the hotplug call chain, which inevitably fails the hotplug itself.

This mechanism was suggested by David Hildenbrand during another discussion
with respect to a memory hotplug fix on arm64 platform.

https://lore.kernel.org/linux-arm-kernel/1600332402-30123-1-git-send-email-anshuman.khand...@arm.com/

This mechanism focuses on the addressibility aspect and not [sub] section
alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span()
have been left unchanged. Wondering if all these can still be unified in
an expanded memhp_range_allowed() check, that can be called from multiple
memory hot add and remove paths.

This series applies on v5.10-rc5 and has been slightly tested on arm64.
But looking for some early feedback here.

Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (3):
  mm/hotplug: Pre-validate the address range with platform
  arm64/mm: Define arch_get_addressable_range()
  s390/mm: Define arch_get_addressable_range()

 arch/arm64/include/asm/memory.h |  3 ++
 arch/arm64/mm/mmu.c | 19 ++--
 arch/s390/include/asm/mmu.h |  2 ++
 arch/s390/mm/vmem.c | 16 ---
 include/linux/memory_hotplug.h  | 51 +
 mm/memory_hotplug.c | 29 ++-
 mm/memremap.c   |  9 +-
 7 files changed, 96 insertions(+), 33 deletions(-)

-- 
2.20.1

[RFC 3/3] s390/mm: Define arch_get_addressable_range()

2020-11-22 Thread Anshuman Khandual

This overrides arch_get_addressable_range() on s390 platform and drops
now redudant similar check in vmem_add_mapping().

Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: David Hildenbrand 
Cc: linux-s...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/s390/include/asm/mmu.h |  2 ++
 arch/s390/mm/vmem.c | 16 
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
index e12ff0f29d1a..f92d3926b188 100644
--- a/arch/s390/include/asm/mmu.h
+++ b/arch/s390/include/asm/mmu.h
@@ -55,4 +55,6 @@ static inline int tprot(unsigned long addr)
return rc;
 }
 
+#define arch_get_addressable_range arch_get_addressable_range
+struct range arch_get_addressable_range(bool need_mapping);
 #endif
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index b239f2ba93b0..e03ad0ed13a7 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -532,14 +532,22 @@ void vmem_remove_mapping(unsigned long start, unsigned 
long size)
mutex_unlock(&vmem_mutex);
 }
 
+struct range arch_get_addressable_range(bool need_mapping)
+{
+   struct range memhp_range;
+
+   memhp_range.start = 0;
+   if (need_mapping)
+   memhp_range.end =  VMEM_MAX_PHYS;
+   else
+   memhp_range.end = (1ULL << (MAX_PHYSMEM_BITS + 1)) - 1;
+   return memhp_range;
+}
+
 int vmem_add_mapping(unsigned long start, unsigned long size)
 {
int ret;
 
-   if (start + size > VMEM_MAX_PHYS ||
-   start + size < start)
-   return -ERANGE;
-
mutex_lock(&vmem_mutex);
ret = vmem_add_range(start, size);
if (ret)
-- 
2.20.1

[RFC 2/3] arm64/mm: Define arch_get_addressable_range()

2020-11-22 Thread Anshuman Khandual

This overrides arch_get_addressable_range() on arm64 platform which will be
used with recently added generic framework. It drops inside_linear_region()
and subsequent check in arch_add_memory() which are no longer required.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: David Hildenbrand 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/include/asm/memory.h |  3 +++
 arch/arm64/mm/mmu.c | 19 +++
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index cd61239bae8c..0ef7948eb58c 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -328,6 +328,9 @@ static inline void *phys_to_virt(phys_addr_t x)
 })
 
 void dump_mem_limit(void);
+
+#define arch_get_addressable_range arch_get_addressable_range
+struct range arch_get_addressable_range(bool need_mapping);
 #endif /* !ASSEMBLY */
 
 /*
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ca692a815731..a6433caf337f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1444,16 +1444,24 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned 
long start, u64 size)
free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
 }
 
-static bool inside_linear_region(u64 start, u64 size)
+struct range arch_get_addressable_range(bool need_mapping)
 {
+   struct range memhp_range;
+
/*
 * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
 * accommodating both its ends but excluding PAGE_END. Max physical
 * range which can be mapped inside this linear mapping range, must
 * also be derived from its end points.
 */
-   return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
-  (start + size - 1) <= __pa(PAGE_END - 1);
+   if (need_mapping) {
+   memhp_range.start = __pa(_PAGE_OFFSET(vabits_actual));
+   memhp_range.end =  __pa(PAGE_END - 1);
+   } else {
+   memhp_range.start = 0;
+   memhp_range.end = (1ULL << (MAX_PHYSMEM_BITS + 1)) - 1;
+   }
+   return memhp_range;
 }
 
 int arch_add_memory(int nid, u64 start, u64 size,
@@ -1461,11 +1469,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
 {
int ret, flags = 0;
 
-   if (!inside_linear_region(start, size)) {
-   pr_err("[%llx %llx] is outside linear mapping region\n", start, 
start + size);
-   return -EINVAL;
-   }
-
if (rodata_full || debug_pagealloc_enabled())
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
-- 
2.20.1

[RFC 1/3] mm/hotplug: Pre-validate the address range with platform

2020-11-22 Thread Anshuman Khandual

This introduces memhp_range_allowed() which gets called in various hotplug
paths. Then memhp_range_allowed() calls arch_get_addressable_range() via
memhp_get_pluggable_range(). This callback can be defined by the platform
to provide addressable physical range, depending on whether kernel linear
mapping is required or not. This mechanism will prevent platform specific
errors deep down during hotplug calls. While here, this drops now redundant
check_hotplug_memory_addressable() check in __add_pages().

Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: David Hildenbrand 
Signed-off-by: Anshuman Khandual 
---
 include/linux/memory_hotplug.h | 51 ++
 mm/memory_hotplug.c| 29 ++-
 mm/memremap.c  |  9 +-
 3 files changed, 68 insertions(+), 21 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 551093b74596..2018c0201672 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct page;
 struct zone;
@@ -70,6 +71,56 @@ typedef int __bitwise mhp_t;
  */
 #define MEMHP_MERGE_RESOURCE   ((__force mhp_t)BIT(0))
 
+/*
+ * Platforms should define arch_get_addressable_range() which provides
+ * addressable physical memory range depending upon whether the linear
+ * mapping is required or not. Returned address range must follow
+ *
+ * 1. range.start <= range.end
+ * 1. Must include end both points i.e range.start and range.end
+ *
+ * Nonetheless there is a fallback definition provided here providing
+ * maximum possible addressable physical range, irrespective of the
+ * linear mapping requirements.
+ */
+#ifndef arch_get_addressable_range
+static inline struct range arch_get_addressable_range(bool need_mapping)
+{
+   struct range memhp_range = {
+   .start = 0UL,
+   .end = -1ULL,
+   };
+   return memhp_range;
+}
+#endif
+
+static inline struct range memhp_get_pluggable_range(bool need_mapping)
+{
+   const u64 max_phys = (1ULL << (MAX_PHYSMEM_BITS + 1)) - 1;
+   struct range memhp_range = arch_get_addressable_range(need_mapping);
+
+   if (memhp_range.start > max_phys) {
+   memhp_range.start = 0;
+   memhp_range.end = 0;
+   }
+   memhp_range.end = min_t(u64, memhp_range.end, max_phys);
+   return memhp_range;
+}
+
+static inline bool memhp_range_allowed(u64 start, u64 size, bool need_mapping)
+{
+   struct range memhp_range = memhp_get_pluggable_range(need_mapping);
+   u64 end = start + size;
+
+   if ((start < end) && (start >= memhp_range.start) &&
+  ((end - 1) <= memhp_range.end))
+   return true;
+
+   WARN(1, "Hotplug memory [%#llx-%#llx] exceeds maximum addressable range 
[%#llx-%#llx]\n",
+   start, end, memhp_range.start, memhp_range.end);
+   return false;
+}
+
 /*
  * Extended parameters for memory hotplug:
  * altmap: alternative allocator for memmap array (optional)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 63b2e46b6555..9efb6c8558ed 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -284,22 +284,6 @@ static int check_pfn_span(unsigned long pfn, unsigned long 
nr_pages,
return 0;
 }
 
-static int check_hotplug_memory_addressable(unsigned long pfn,
-   unsigned long nr_pages)
-{
-   const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
-
-   if (max_addr >> MAX_PHYSMEM_BITS) {
-   const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1;
-   WARN(1,
-"Hotplugged memory exceeds maximum addressable address, 
range=%#llx-%#llx, maximum=%#llx\n",
-(u64)PFN_PHYS(pfn), max_addr, max_allowed);
-   return -E2BIG;
-   }
-
-   return 0;
-}
-
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -317,10 +301,6 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned 
long nr_pages,
if (WARN_ON_ONCE(!params->pgprot.pgprot))
return -EINVAL;
 
-   err = check_hotplug_memory_addressable(pfn, nr_pages);
-   if (err)
-   return err;
-
if (altmap) {
/*
 * Validate altmap is within bounds of the total request
@@ -1109,6 +1089,9 @@ int __ref __add_memory(int nid, u64 start, u64 size, 
mhp_t mhp_flags)
struct resource *res;
int ret;
 
+   if (!memhp_range_allowed(start, size, 1))
+   return -ERANGE;
+
res = register_memory_resource(start, size, "System RAM");
if (IS_ERR(res))
return PTR_ERR(res);
@@ -1123,6 +1106,9 @@ int add_memory(int nid, u64 start,

Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE

2020-11-22 Thread Anshuman Khandual

Hello Tingwei,

On 11/14/20 10:47 AM, Tingwei Zhang wrote:
> Hi Anshuman,
> 
> On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
>> This series enables future IP trace features Embedded Trace Extension (ETE)
>> and Trace Buffer Extension (TRBE). This series depends on the ETM system
>> register instruction support series [0] and the v8.4 Self hosted tracing
>> support series (Jonathan Zhou) [1]. The tree is available here [2] for
>> quick access.
>>
>> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
>> extensions. ETE overlaps with the ETMv4 architecture, with additions to
>> support the newer architecture features and some restrictions on the
>> supported features w.r.t ETMv4. The ETE support is added by extending the
>> ETMv4 driver to recognise the ETE and handle the features as exposed by the
>> TRCIDRx registers. ETE only supports system instructions access from the
>> host CPU. The ETE could be integrated with a TRBE (see below), or with the
>> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
>> description as the ETMs and requires a node per instance.
>>
>> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
>> accessible via the system registers and can be combined with the ETE to
>> provide a 1x1 configuration of source & sink. TRBE is being represented
>> here as a CoreSight sink. Primary reason is that the ETE source could work
>> with other traditional CoreSight sink devices. As TRBE captures the trace
>> data which is produced by ETE, it cannot work alone.
>>
>> TRBE representation here have some distinct deviations from a traditional
>> CoreSight sink device. Coresight path between ETE and TRBE are not built
>> during boot looking at respective DT or ACPI entries. Instead TRBE gets
>> checked on each available CPU, when found gets connected with respective
>> ETE source device on the same CPU, after altering its outward connections.
>> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
>> coupling/decoupling method implemented here is not optimal and would be
>> reworked later on.
> Only perf mode is supported in TRBE in current path. Will you consider
> support sysfs mode as well in following patch sets?

Yes, either in subsequent versions or later on, after first getting the perf
based functionality enabled. Nonetheless, sysfs is also on the todo list as
mentioned in the cover letter.

- Anshuman

Re: [RFC 00/11] arm64: coresight: Enable ETE and TRBE

2020-11-22 Thread Anshuman Khandual

Hello Mike,

On 11/16/20 8:30 PM, Mike Leach wrote:
> Hi Anshuman,
> 
> I've not looked in detail at this set yet, but having skimmed through
> it  I do have an initial question about the handling of wrapped data
> buffers.
> 
> With the ETR/ETB we found an issue with the way perf concatenated data
> captured from the hardware buffer into a single contiguous data
> block. The issue occurs when a wrapped buffer appears after another
> buffer in the data file. In a typical session perf would stop trace
> and copy the hardware buffer multiple times into the auxtrace buffer.

The hardware buffer and perf aux trace buffer are the same for TRBE and
hence there is no actual copy involved. Trace data gets pushed into the
user space via perf_aux_output_end() either via etm_event_stop() or via
the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space
happens via updates to perf aux buffer indices i.e head, tail, wake up.
But logically, they will appear as a stream of records to the user space
while parsing perf.data file.

> 
> e.g.
> 
> For ETR/ETB we have a fixed length hardware data buffer - and no way
> of detecting buffer wraps using interrupts as the tracing is in
> progress.

TRBE has an interrupt. Hence there will be an opportunity to insert any
additional packets if required to demarcate pre and post IRQ trace data
streams. 

> 
> If the buffer is not full at the point that perf transfers it then the
> data will look like this:-
> 1) 
> easy to decode, we can see the async at the start of the data - which
> would be the async issued at the start of trace.

Just curious, what makes the tracer to generate the  trace packet.
Is there an explicit instruction or that is how the tracer starts when
enabled ?

> 
> If the buffer wraps we see this:-
> 
> 2) 
> 
> Again no real issue, the decoder will skip to the async and trace from
> there - we lose the unsynced data.

Could you please elaborate more on the difference between sync and async
trace data ?

> 
> Now the problem occurs when multiple transfers of data occur. We can
> see the following appearing as contiguous trace in the auxtrace
> buffer:-
> 
> 3) < async>

So there is an wrap around event between  and
 ? Are there any other situations where this
might happen ?

> 
> Now the decoder cannot spot the point that the synced data from the
> first capture ends, and the unsynced data from the second capture
> begins.

Got it.

> This means it will continue to decode into the unsynced data - which
> will result in incorrect trace / outright errors. To get round this
> for ETR/ETB the driver will insert barrier packets into the datafile
> if a wrap event is detected.

But you mentioned there are on IRQs on ETR/ETB. So how the wrap event
is even detected ?

> 
> 4)  data>
> 
> This  has the effect of resetting the decoder into the
> unsynced state so that the invalid trace is not decoded. This is a
> workaround we have to do to handle the limitations of the ETR / ETB
> trace hardware.
Got it.

> 
> For TRBE we do have interrupts, so it should be possible to prevent
> the buffer wrapping in most cases - but I did see in the code that
> there are handlers for the TRBE buffer wrap management event. Are
> there other factors in play that will prevent data pattern 3) from
> appearing in the auxtrace buffer ?

On TRBE, the buffer wrapping cannot happen without generating an IRQ. I
would assume that ETE will then start again with an  data packet
first when the handler returns. Otherwise we might also have to insert
a similar barrier packet for the user space tool to reset. As trace data
should not get lost during an wrap event, ETE should complete the packet
after the handler returns, hence aux buffer should still have logically
contiguous stream of  to decode. I am not sure right
now, but will look into this.

- Anshuman

Re: [RFC 07/11] coresight: sink: Add TRBE driver

2020-11-22 Thread Anshuman Khandual




On 11/14/20 11:08 AM, Tingwei Zhang wrote:
> Hi Anshuman,
> 
> On Tue, Nov 10, 2020 at 08:45:05PM +0800, Anshuman Khandual wrote:
>> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
>> accessible via the system registers. The TRBE supports different addressing
>> modes including CPU virtual address and buffer modes including the circular
>> buffer mode. The TRBE buffer is addressed by a base pointer (TRBBASER_EL1),
>> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But the
>> access to the trace buffer could be prohibited by a higher exception level
>> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
>> private interrupt (PPI) on address translation errors and when the buffer
>> is full. Overall implementation here is inspired from the Arm SPE driver.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  Documentation/trace/coresight/coresight-trbe.rst |  36 ++
>>  arch/arm64/include/asm/sysreg.h  |   2 +
>>  drivers/hwtracing/coresight/Kconfig  |  11 +
>>  drivers/hwtracing/coresight/Makefile |   1 +
>>  drivers/hwtracing/coresight/coresight-trbe.c | 766 
>> +++
>>  drivers/hwtracing/coresight/coresight-trbe.h | 525 
>>  6 files changed, 1341 insertions(+)
>>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
>>
>> diff --git a/Documentation/trace/coresight/coresight-trbe.rst 
>> b/Documentation/trace/coresight/coresight-trbe.rst
>> new file mode 100644
>> index 000..4320a8b
>> --- /dev/null
>> +++ b/Documentation/trace/coresight/coresight-trbe.rst
>> @@ -0,0 +1,36 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==
>> +Trace Buffer Extension (TRBE).
>> +==
>> +
>> +:Author:   Anshuman Khandual 
>> +:Date: November 2020
>> +
>> +Hardware Description
>> +
>> +
>> +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system
>> +memory, CPU traces generated from a corresponding percpu tracing unit. This
>> +gets plugged in as a coresight sink device because the corresponding trace
>> +genarators (ETE), are plugged in as source device.
>> +
>> +Sysfs files and directories
>> +---
>> +
>> +The TRBE devices appear on the existing coresight bus alongside the other
>> +coresight devices::
>> +
>> +>$ ls /sys/bus/coresight/devices
>> +trbe0  trbe1  trbe2 trbe3
>> +
>> +The ``trbe`` named TRBEs are associated with a CPU.::
>> +
>> +>$ ls /sys/bus/coresight/devices/trbe0/
>> +irq align dbm
>> +
>> +*Key file items are:-*
>> +   * ``irq``: TRBE maintenance interrupt number
>> +   * ``align``: TRBE write pointer alignment
>> +   * ``dbm``: TRBE updates memory with access and dirty flags
>> +
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index 14cb156..61136f6 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -97,6 +97,7 @@
>>  #define SET_PSTATE_UAO(x)   __emit_inst(0xd500401f | PSTATE_UAO | 
>> ((!!x) << 
>> PSTATE_Imm_shift))
>>  #define SET_PSTATE_SSBS(x)  __emit_inst(0xd500401f | PSTATE_SSBS | 
>> ((!!x) 
>> << PSTATE_Imm_shift))
>>  #define SET_PSTATE_TCO(x)   __emit_inst(0xd500401f | PSTATE_TCO | 
>> ((!!x) << 
>> PSTATE_Imm_shift))
>> +#define TSB_CSYNC   __emit_inst(0xd503225f)
>>
>>  #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
>>  __emit_inst(0xd500 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 
>> 0x1f))
>> @@ -865,6 +866,7 @@
>>  #define ID_AA64MMFR2_CNP_SHIFT  0
>>
>>  /* id_aa64dfr0 */
>> +#define ID_AA64DFR0_TRBE_SHIFT  44
>>  #define ID_AA64DFR0_TRACE_FILT_SHIFT40
>>  #define ID_AA64DFR0_DOUBLELOCK_SHIFT36
>>  #define ID_AA64DFR0_PMSVER_SHIFT32
>> diff --git a/drivers/hwtracing/coresight/Kconfig 
>> b/drivers/hwtracing/coresight/Kconfig
>> index c119824..0f5e101 100644
>> --- a/drivers/hwtracing/coresight/Kconfig
>> +++ b/drivers/hwtracing/coresight/Kconfig
>> @@ -156,6 +156,17 @@ config CORESIGHT_CTI
>>To compile this driver as a module, choose M here: the
>>module will be called coresight-cti.
>>
>> +config CORESIGHT_TRBE
>> +bool "Trace Buffer Extension (TRBE) driver"
> 
> Can you consider to support TRBE as loadable module since all coresight
> drivers support loadable module now.

Reworking the TRBE driver and making it a loadable module is part of it.

- Anshuman

Re: [RFC 10/11] coresgith: etm-perf: Connect TRBE sink with ETE source

2020-11-22 Thread Anshuman Khandual

On 11/12/20 3:01 PM, Suzuki K Poulose wrote:
> Hi Anshuman,
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>> Unlike traditional sink devices, individual TRBE instances are not detected
>> via DT or ACPI nodes. Instead TRBE instances are detected during CPU online
>> process. Hence a path connecting ETE and TRBE on a given CPU would not have
>> been established until then. This adds two coresight helpers that will help
>> modify outward connections from a source device to establish and terminate
>> path to a given sink device. But this method might not be optimal and would
>> be reworked later.
>>
>> Signed-off-by: Anshuman Khandual 
> 
> Instead of this, could we come up something like a percpu_sink concept ? That
> way, the TRBE driver could register the percpu_sink for the corresponding CPU
> and we don't have to worry about the order in which the ETE will be probed
> on a hotplugged CPU. (i.e, if the TRBE is probed before the ETE, the following
> approach would fail to register the sink).

Right, it wont work.

We already have a per cpu csdev sink. The current mechanism expects all ETEs
to have been established and the TRBEs just get plugged in during their init
while probing each individual cpus. During cpu hotplug in or out, a TRBE-ETE
link either gets created and destroyed. But it assumes that an ETE is always
present for TRBE to get plugged into or teared down from. csdev for TRBE sink
too gets released during cpu hot remove path.

Are you suggesting that there should be a percpu static csdev array defined
for potential all TRBEs so that the ETE-TRBE links be permanently established
given that the ETEs are permanent and never really go away with cpu hot remove
event (my assumption). TRBE csdevs should just get enabled or disabled without
really being destroyed during cpu hotplug, so that the corresponding TRBE-ETE
connection remains in place.

> 
> And the default sink can be initialized when the ETE instance first starts
> looking for it.

IIUC def_sink is the sink which will be selected by default for a source device
while creating a path, in case there is no clear preference from the user. ETE's
default sink should be fixed (TRBE) to be on the easy side and hence assigning
that during connection expansion procedure, does make sense. But then it can be
more complex where the 'default' sink for an ETE can be scenario specific and
may not be always be its TRBE.

The expanding connections fits into a scenario where the ETE is present with
all it's other traditional sinks and TRBE is the one which comes in or goes out
with the cpu.

If ETE also comes in and goes out with individual cpu hotplug which is preferred
ideally, we would need to also

1. Co-ordinate with TRBE bring up and connection creation to avoid race
2. Rediscover traditional sinks which were attached to the ETE before -
   go back, rescan the DT/ACPI entries for sinks with whom a path can
   be established etc.

Basically there are three choices we have here

1. ETE is permanent, TRBE and ETE-TRBE path gets created or destroyed with 
hotplug (current proposal)
2. ETE/TRBE/ETE-TRBE path are all permanent, ETE and TRBE get enabled or 
disabled with hotplug
3. ETE, TRBE and ETE-TRBE path, all get created, enabled and destroyed with 
hotplug in sync

- Anshuman

Re: [RFC 09/11] coresight: etm-perf: Disable the path before capturing the trace data

2020-11-22 Thread Anshuman Khandual




On 11/12/20 2:57 PM, Suzuki K Poulose wrote:
> On 11/10/20 12:45 PM, Anshuman Khandual wrote:
>> perf handle structure needs to be shared with the TRBE IRQ handler for
>> capturing trace data and restarting the handle. There is a probability
>> of an undefined reference based crash when etm event is being stopped
>> while a TRBE IRQ also getting processed. This happens due the release
>> of perf handle via perf_aux_output_end(). This stops the sinks via the
>> link before releasing the handle, which will ensure that a simultaneous
>> TRBE IRQ could not happen.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>> This might cause problem with traditional sink devices which can be
>> operated in both sysfs and perf mode. This needs to be addressed
>> correctly. One option would be to move the update_buffer callback
>> into the respective sink devices. e.g, disable().
>>
>>   drivers/hwtracing/coresight/coresight-etm-perf.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c 
>> b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> index 534e205..1a37991 100644
>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>> @@ -429,7 +429,9 @@ static void etm_event_stop(struct perf_event *event, int 
>> mode)
>>     size = sink_ops(sink)->update_buffer(sink, handle,
>>     event_data->snk_config);
>> +    coresight_disable_path(path);
>>   perf_aux_output_end(handle, size);
>> +    return;
>>   }
> 
> As you mentioned, this is not ideal where another session could be triggered 
> on
> the sink from a different ETM (not for per-CPU sink) in a different mode 
> before
> you collect the buffer. I believe the best option is to leave the
> update_buffer() to disable_hw. This would need to pass on the "handle" to the
> disable_path.

Passing 'handle' into coresight_ops_sink->disable() would enable pushing
updated trace data into perf aux buffer. But do you propose to drop the
update_buffer() call back completely or just move it into disable() call
back (along with PERF_EF_UPDATE mode check) for all individual sinks for
now. May be, later it can be dropped off completely.

> 
> That way the races can be handled inside the sinks. Also, this aligns the
> perf mode of the sinks with that of the sysfs mode.

Did not get that, could you please elaborate ?

Re: [PATCH RFC/RFT v3 6/9] powerpc: move cacheinfo sysfs to generic cacheinfo infrastructure

2014-03-20 Thread Anshuman Khandual

On 03/10/2014 04:42 PM, Sudeep Holla wrote:
> Hi Anshuman,
> 
> On 07/03/14 06:14, Anshuman Khandual wrote:
>> On 03/07/2014 09:36 AM, Anshuman Khandual wrote:
>>> On 02/19/2014 09:36 PM, Sudeep Holla wrote:
>>>> From: Sudeep Holla 
>>>>
>>>> This patch removes the redundant sysfs cacheinfo code by making use of
>>>> the newly introduced generic cacheinfo infrastructure.
>>>>
>>>> Signed-off-by: Sudeep Holla 
>>>> Cc: Benjamin Herrenschmidt 
>>>> Cc: Paul Mackerras 
>>>> Cc: linuxppc-...@lists.ozlabs.org
>>>> ---
>>>>   arch/powerpc/kernel/cacheinfo.c | 831
>>>> ++--
>>>>   arch/powerpc/kernel/cacheinfo.h |   8 -
>>>>   arch/powerpc/kernel/sysfs.c |   4 -
>>>>   3 files changed, 109 insertions(+), 734 deletions(-)
>>>>   delete mode 100644 arch/powerpc/kernel/cacheinfo.h
>>>>
>>>> diff --git a/arch/powerpc/kernel/cacheinfo.c
>>>> b/arch/powerpc/kernel/cacheinfo.c
>>>> index 2912b87..05b7580 100644
>>>> --- a/arch/powerpc/kernel/cacheinfo.c
>>>> +++ b/arch/powerpc/kernel/cacheinfo.c
>>>> @@ -10,38 +10,10 @@
>>>>* 2 as published by the Free Software Foundation.
>>>>*/
>>>>
>>>> +#include 
>>>>   #include 
>>>> -#include 
>>>>   #include 
>>>> -#include 
>>>> -#include 
>>>> -#include 
>>>>   #include 
>>>> -#include 
>>>> -#include 
>>>> -#include 
>>>> -
>>>> -#include "cacheinfo.h"
>>>> -
>>>> -/* per-cpu object for tracking:
>>>> - * - a "cache" kobject for the top-level directory
>>>> - * - a list of "index" objects representing the cpu's local cache
>>>> hierarchy
>>>> - */
>>>> -struct cache_dir {
>>>> -struct kobject *kobj; /* bare (not embedded) kobject for cache
>>>> -   * directory */
>>>> -struct cache_index_dir *index; /* list of index objects */
>>>> -};
>>>> -
>>>> -/* "index" object: each cpu's cache directory has an index
>>>> - * subdirectory corresponding to a cache object associated with the
>>>> - * cpu.  This object's lifetime is managed via the embedded kobject.
>>>> - */
>>>> -struct cache_index_dir {
>>>> -struct kobject kobj;
>>>> -struct cache_index_dir *next; /* next index in parent directory */
>>>> -struct cache *cache;
>>>> -};
>>>>
>>>>   /* Template for determining which OF properties to query for a given
>>>>* cache type */
>>>> @@ -60,11 +32,6 @@ struct cache_type_info {
>>>>   const char *nr_sets_prop;
>>>>   };
>>>>
>>>> -/* These are used to index the cache_type_info array. */
>>>> -#define CACHE_TYPE_UNIFIED 0
>>>> -#define CACHE_TYPE_INSTRUCTION 1
>>>> -#define CACHE_TYPE_DATA2
>>>> -
>>>>   static const struct cache_type_info cache_type_info[] = {
>>>>   {
>>>>   /* PowerPC Processor binding says the [di]-cache-*
>>>> @@ -77,246 +44,115 @@ static const struct cache_type_info
>>>> cache_type_info[] = {
>>>>   .nr_sets_prop= "d-cache-sets",
>>>>   },
>>>>   {
>>>> -.name= "Instruction",
>>>> -.size_prop   = "i-cache-size",
>>>> -.line_size_props = { "i-cache-line-size",
>>>> - "i-cache-block-size", },
>>>> -.nr_sets_prop= "i-cache-sets",
>>>> -},
>>>> -{
>>>>   .name= "Data",
>>>>   .size_prop   = "d-cache-size",
>>>>   .line_size_props = { "d-cache-line-size",
>>>>"d-cache-block-size", },
>>>>   .nr_sets_prop= "d-cache-sets",
>>>>   },
>>>> +{
>>>> +.name= "Instruction",
>>>> +.size_prop   = "i-cache-size",
>>>> +.line_size_props = { "i-cache-line-size",
>>>> +

Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters

2014-01-22 Thread Anshuman Khandual

On 01/22/2014 07:02 AM, Michael Ellerman wrote:
> On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
>> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
>> performance counters: gpci ("get performance counter info") and 24x7.
>>
>> The counters supplied by these interfaces are continually counting and never
>> need to be (and cannot be) disabled or enabled. They additionally do not
>> generate any interrupts. This makes them in some regards similar to software
>> counters, and as a result their implimentation shares some common code (which
>> an initial patch exposes) with the sw counters.
> 
> Hi Cody,
> 
> Can you please add some more explanation of this series.
> 
> In particular why do we need two new PMUs, and how do they relate to each
> other?
> 
> And can you add an example of how I'd actually use them using perf.
> 

Yeah, agreed.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] powerpc, ptrace: Add few more ptrace request macros

2014-04-01 Thread Anshuman Khandual

On 04/02/2014 06:13 AM, Michael Neuling wrote:
> Anshuman Khandual  wrote:
>> > This patch adds few more ptrace request macros expanding
>> > the existing capability. These ptrace requests macros can
>> > be classified into two categories.
> Why is this only an RFC?
> 

Looking for comments, suggestions, concerns from people. But looks
like its bit big a patch to review at once.

> Also, please share the test case that you wrote for this.

Will split the patch into multiple components, add the test case
and send out again.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] powerpc, ptrace: Add new ptrace request macro for miscellaneous registers

2014-04-02 Thread Anshuman Khandual

This patch adds following new set of ptrace request macros for miscellaneous
registers expanding the existing ptrace ABI on PowerPC.

/* Miscellaneous registers */
PTRACE_GETMSCREGS
PTRACE_SETMSCREGS

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/uapi/asm/ptrace.h | 10 
 arch/powerpc/kernel/ptrace.c   | 91 +-
 2 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index 1a12c36..bce1055 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -241,6 +241,16 @@ struct pt_regs {
 #define PTRACE_GETTM_CVMXREGS  0x76
 #define PTRACE_SETTM_CVMXREGS  0x77
 
+/* Miscellaneous registers */
+#define PTRACE_GETMSCREGS  0x78
+#define PTRACE_SETMSCREGS  0x79
+
+/*
+ * XXX: A note to application developers. The existing data layout
+ * of the above four ptrace requests can change when new registers
+ * are available for each category in forthcoming processors.
+ */
+
 #ifndef __ASSEMBLY__
 
 struct ppc_debug_info {
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 9fbcb6a..2893958 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -1054,6 +1054,76 @@ static int tm_cvmx_set(struct task_struct *target, const 
struct user_regset *reg
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 /*
+ * Miscellaneous Registers
+ *
+ * struct {
+ * unsigned long dscr;
+ * unsigned long ppr;
+ * unsigned long tar;
+ * };
+ */
+static int misc_get(struct task_struct *target, const struct user_regset 
*regset,
+  unsigned int pos, unsigned int count,
+  void *kbuf, void __user *ubuf)
+{
+   int ret;
+
+   /* DSCR register */
+   ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+   &target->thread.dscr, 0,
+   sizeof(unsigned long));
+
+   BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned 
long) +
+   sizeof(unsigned long) != offsetof(struct 
thread_struct, ppr));
+
+   /* PPR register */
+   if (!ret)
+   ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.ppr, sizeof(unsigned 
long),
+ 2 * sizeof(unsigned long));
+
+   BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long)
+   != offsetof(struct thread_struct, tar));
+   /* TAR register */
+   if (!ret)
+   ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.tar, 2 * 
sizeof(unsigned long),
+ 3 * sizeof(unsigned long));
+   return ret;
+}
+
+static int misc_set(struct task_struct *target, const struct user_regset 
*regset,
+  unsigned int pos, unsigned int count,
+  const void *kbuf, const void __user *ubuf)
+{
+   int ret;
+
+   /* DSCR register */
+   ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+   &target->thread.dscr, 0,
+   sizeof(unsigned long));
+
+   BUILD_BUG_ON(offsetof(struct thread_struct, dscr) + sizeof(unsigned 
long) +
+   sizeof(unsigned long) != offsetof(struct thread_struct, 
ppr));
+
+   /* PPR register */
+   if (!ret)
+   ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+   &target->thread.ppr, 
sizeof(unsigned long),
+   2 * sizeof(unsigned long));
+
+   BUILD_BUG_ON(offsetof(struct thread_struct, ppr) + sizeof(unsigned long)
+   != offsetof(struct 
thread_struct, tar));
+
+   /* TAR register */
+   if (!ret)
+   ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+   &target->thread.tar, 2 * 
sizeof(unsigned long),
+   3 * sizeof(unsigned long));
+   return ret;
+}
+
+/*
  * These are our native regset flavors.
  */
 enum powerpc_regset {
@@ -1072,8 +1142,9 @@ enum powerpc_regset {
REGSET_TM_SPR,  /* TM specific SPR */
REGSET_TM_CGPR, /* TM checkpointed GPR */
REGSET_TM_CFPR, /* TM checkpointed FPR */
-   REGSET_TM_CVMX  /* TM checkpointed VMX */
+   REGSET_TM_CVMX, /* TM checkpointed VMX */
 #endif
+   REGSET_MISC /* Miscellaneous */
 };
 
 static const struct user_regset

[PATCH 1/3] elf: Add some new PowerPC specifc note sections

2014-04-02 Thread Anshuman Khandual

This patch adds four new note sections for transactional memory
and one note section for some miscellaneous registers. This addition
of new elf note sections extends the existing elf ABI without affecting
it in any manner.

Signed-off-by: Anshuman Khandual 
---
 include/uapi/linux/elf.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index ef6103b..bd59452 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -379,6 +379,11 @@ typedef struct elf64_shdr {
 #define NT_PPC_VMX 0x100   /* PowerPC Altivec/VMX registers */
 #define NT_PPC_SPE 0x101   /* PowerPC SPE/EVR registers */
 #define NT_PPC_VSX 0x102   /* PowerPC VSX registers */
+#define NT_PPC_TM_SPR  0x103   /* PowerPC transactional memory special 
registers */
+#define NT_PPC_TM_CGPR 0x104   /* PowerpC transactional memory 
checkpointed GPR */
+#define NT_PPC_TM_CFPR 0x105   /* PowerPC transactional memory 
checkpointed FPR */
+#define NT_PPC_TM_CVMX 0x106   /* PowerPC transactional memory 
checkpointed VMX */
+#define NT_PPC_MISC0x107   /* PowerPC miscellaneous registers */
 #define NT_386_TLS 0x200   /* i386 TLS slots (struct user_desc) */
 #define NT_386_IOPERM  0x201   /* x86 io permission bitmap (1=deny) */
 #define NT_X86_XSTATE  0x202   /* x86 extended state using xsave */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] powerpc, ptrace: Add new ptrace request macros for transactional memory

2014-04-02 Thread Anshuman Khandual

This patch adds following new sets of ptrace request macros for transactional
memory expanding the existing ptrace ABI on PowerPC.

/* TM special purpose registers */
PTRACE_GETTM_SPRREGS
PTRACE_SETTM_SPRREGS

/* TM checkpointed GPR registers */
PTRACE_GETTM_CGPRREGS
PTRACE_SETTM_CGPRREGS

/* TM checkpointed FPR registers */
PTRACE_GETTM_CFPRREGS
PTRACE_SETTM_CFPRREGS

/* TM checkpointed VMX registers */
PTRACE_GETTM_CVMXREGS
PTRACE_SETTM_CVMXREGS

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/switch_to.h   |   8 +
 arch/powerpc/include/uapi/asm/ptrace.h |  51 +++
 arch/powerpc/kernel/process.c  |  24 ++
 arch/powerpc/kernel/ptrace.c   | 570 +++--
 4 files changed, 625 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 0e83e7d..22095e2 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct *t)
 }
 #endif
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+extern void flush_tmreg_to_thread(struct task_struct *);
+#else
+static inline void flush_tmreg_to_thread(struct task_struct *t)
+{
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+
 static inline void clear_task_ebb(struct task_struct *t)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index 77d2ed3..1a12c36 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -190,6 +190,57 @@ struct pt_regs {
 #define PPC_PTRACE_SETHWDEBUG  0x88
 #define PPC_PTRACE_DELHWDEBUG  0x87
 
+/* Transactional memory */
+
+/*
+ * TM specific SPR
+ *
+ * struct data {
+ * u64 tm_tfhar;
+ * u64 tm_texasr;
+ * u64 tm_tfiar;
+ * unsigned long   tm_orig_msr;
+ * u64 tm_tar;
+ * u64 tm_ppr;
+ * u64 tm_dscr;
+ * };
+ */
+#define PTRACE_GETTM_SPRREGS   0x70
+#define PTRACE_SETTM_SPRREGS   0x71
+
+/*
+ * TM Checkpointed GPR
+ *
+ * struct data {
+ * struct pt_regs  ckpt_regs;
+ * };
+ */
+#define PTRACE_GETTM_CGPRREGS  0x72
+#define PTRACE_SETTM_CGPRREGS  0x73
+
+/*
+ * TM Checkpointed FPR
+ *
+ * struct data {
+ * u64 fpr[32];
+ * u64 fpscr;
+ * };
+ */
+#define PTRACE_GETTM_CFPRREGS  0x74
+#define PTRACE_SETTM_CFPRREGS  0x75
+
+/*
+ * TM Checkpointed VMX
+ *
+ * struct data {
+ * vector128   vr[32];
+ * vector128   vscr;
+ * unsigned long   vrsave;
+ *};
+ */
+#define PTRACE_GETTM_CVMXREGS  0x76
+#define PTRACE_SETTM_CVMXREGS  0x77
+
 #ifndef __ASSEMBLY__
 
 struct ppc_debug_info {
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index af064d2..230a0ee 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -673,6 +673,30 @@ static inline void __switch_to_tm(struct task_struct *prev)
}
 }
 
+void flush_tmreg_to_thread(struct task_struct *tsk)
+{
+   /*
+* If task is not current, it should have been flushed
+* already to it's thread_struct during __switch_to().
+*/
+   if (tsk != current)
+   return;
+
+   preempt_disable();
+   if (tsk->thread.regs) {
+   /*
+* If we are still current, the TM state need to
+* be flushed to thread_struct as it will be still
+* present in the current cpu
+*/
+   if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
+   __switch_to_tm(tsk);
+   tm_recheckpoint_new_task(tsk);
+   }
+   }
+   preempt_enable();
+}
+
 /*
  * This is called if we are on the way out to userspace and the
  * TIF_RESTORE_TM flag is set.  It checks if we need to reload
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 2e3d2bf..9fbcb6a 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -357,6 +357,17 @@ static int gpr_set(struct task_struct *target, const 
struct user_regset *regset,
return ret;
 }
 
+/*
+ * When any transaction is active, "thread_struct->transact_fp" holds
+ * the current running value of all FPR registers and "thread_struct->
+ * fp_state" holds the last checkpointed FPR registers state for the
+ * current transaction.
+ *
+ * struct data {
+ * u64 fpr[32];
+ * u64 fpscr;
+ * };
+ */
 static int fpr_get(struct task_struct *target, const struct user_regset 
*regset,
   unsigned int pos, unsigned int count,
   void *kbuf, void __user *ubuf)
@@ -365,21 +376,41 @@ static int fpr_get(struct task_struct *target, const 
struct user_regset *regset,
u64 buf[33];
i

[PATCH 0/3] Add new ptrace request macros on PowerPC

2014-04-02 Thread Anshuman Khandual

FPR[0]: %lx\n", fpr->fpr[0]);
printf("TM RN FPR[1]: %lx\n", fpr->fpr[1]);
printf("TM RN FPR[2]: %lx\n", fpr->fpr[2]);
printf("TM RN FPSCR: %lx\n", fpr->fpscr);

/* TM checkpointed FPR */
ret = ptrace(PTRACE_GETTM_CFPRREGS, child, NULL, fpr1);
if (ret == -1) {
printf("PTRACE_GETTM_CFPRREGS failed: %s\n", 
strerror(errno));
exit(-1);
}
printf("---TM checkpointed FPR-\n");
printf("TM CH FPR[0]: %lx\n", fpr1->fpr[0]);
printf("TM CH FPR[1]: %lx\n", fpr1->fpr[1]);
printf("TM CH FPR[2]: %lx\n", fpr1->fpr[2]);
printf("TM CH FPSCR: %lx\n", fpr1->fpscr);

/* Misc debug */
ret = ptrace(PTRACE_GETMSCREGS, child, NULL, dbr1);
if (ret == -1) {
printf("PTRACE_GETMSCREGS failed: %s\n", 
strerror(errno));
exit(-1);
}

printf("---Running miscellaneous 
registers---\n");
printf("TM RN DSCR: %lx\n", dbr1->dscr);
printf("TM RN PPR: %lx\n", dbr1->ppr);
printf("TM RN TAR: %lx\n", dbr1->tar);

ret = ptrace(PTRACE_DETACH, child, NULL, NULL);
if (ret == -1) {
printf("PTRACE_DETACH failed: %s\n", 
strerror(errno));
exit(-1);
}
} while (0);
}
return 0;
}
=

Anshuman Khandual (3):
  elf: Add some new PowerPC specifc note sections
  powerpc, ptrace: Add new ptrace request macros for transactional memory
  powerpc, ptrace: Add new ptrace request macro for miscellaneous registers

 arch/powerpc/include/asm/switch_to.h   |   8 +
 arch/powerpc/include/uapi/asm/ptrace.h |  61 +++
 arch/powerpc/kernel/process.c  |  24 ++
 arch/powerpc/kernel/ptrace.c   | 659 +++--
 include/uapi/linux/elf.h   |   5 +
 5 files changed, 729 insertions(+), 28 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Add new ptrace request macros on PowerPC

2014-04-02 Thread Anshuman Khandual

On 04/02/2014 12:32 PM, Anshuman Khandual wrote:
>   This patch series adds new ELF note sections which are used to
> create new ptrace request macros for various transactional memory and
> miscellaneous registers on PowerPC. Please find the test case exploiting
> the new ptrace request macros and it's results on a POWER8 system.
> 
> RFC: https://lkml.org/lkml/2014/4/1/292
> 
> == Results ==
> ---TM specific SPR--
> TM TFHAR: 19dc
> TM TEXASR: de01ac01
> TM TFIAR: c003f386
> TM CH ORIG_MSR: 9005f032
> TM CH TAR: 6
> TM CH PPR: c
> TM CH DSCR: 1
> ---TM checkpointed GPR-
> TM CH GPR[0]: 197c
> TM CH GPR[1]: 5
> TM CH GPR[2]: 6
> TM CH GPR[7]: 1
> TM CH NIP: 19dc
> TM CH LINK: 197c
> TM CH CCR: 22000422
> ---TM running GPR-
> TM RN GPR[0]: 197c
> TM RN GPR[1]: 7
> TM RN GPR[2]: 8
> TM RN GPR[7]: 5
> TM RN NIP: 19fc
> TM RN LINK: 197c
> TM RN CCR: 2000422
> ---TM running FPR-
> TM RN FPR[0]: 1002d3a3780
> TM RN FPR[1]: 7
> TM RN FPR[2]: 8
> TM RN FPSCR: 0
> ---TM checkpointed FPR-
> TM CH FPR[0]: 1002d3a3780
> TM CH FPR[1]: 5
> TM CH FPR[2]: 6
> TM CH FPSCR: 0
> ---Running miscellaneous registers---
TM RN DSCR: 0

There is a problem in here which I forgot to mention. The running DSCR value
comes from thread->dscr component of the target process. While we are inside the
transaction (which is the case here as we are stuck at "b ." instruction and
have not reached TEND) thread->dscr should have the running value of the DSCR
register at that point of time. Here we expect the DSCR value to be 5 instead
of 0 as shown in the output above. During the tests when I moved the "b ." after
TEND, the thread->dscr gets the value of 5 while all check pointed reg values 
are
thrown away. I believe there is some problem in the way thread->dscr context
is saved away inside the TM section. Will look into this problem further and
keep informed.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V5 0/4] perf: New conditional branch filter

2014-04-07 Thread Anshuman Khandual

On 03/25/2014 09:46 PM, Andi Kleen wrote:
>> Hey Arnaldo,
>>
>> Do you have any comments or suggestions on this ? Have not received any
>> response on these proposed patch series yet. Thank you.
> 
> I read it earlier and it looks good to me.

Hey Andi,

Can I add your "Reviewed-by" or "Acked-by" to all these four patches ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V3 01/10] perf: New conditional branch filter criteria in branch stack sampling

2013-11-26 Thread Anshuman Khandual

On 11/26/2013 11:36 AM, m...@ellerman.id.au wrote:
> Ideally your commit subject would contain a verb, preferably in the present
> tense.
> 
> I think simply "perf: Add PERF_SAMPLE_BRANCH_COND" would be clearer.


Sure, will change it.

> 
> On Wed, 2013-16-10 at 06:56:48 UTC, Anshuman Khandual wrote:
>> POWER8 PMU based BHRB supports filtering for conditional branches.
>> This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
>> will extend the existing perf ABI. Other architectures can provide
>> this functionality with either HW filtering support (if present) or
>> with SW filtering of instructions.
>>
>> Signed-off-by: Anshuman Khandual 
>> Reviewed-by: Stephane Eranian 
>> ---
>>  include/uapi/linux/perf_event.h | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/perf_event.h 
>> b/include/uapi/linux/perf_event.h
>> index 0b1df41..5da52b6 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -160,8 +160,9 @@ enum perf_branch_sample_type {
>>  PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
>>  PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */
>>  PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */
>> +PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
>>  
>> -PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
>> +PERF_SAMPLE_BRANCH_MAX  = 1U << 11, /* non-ABI */
>>  };
> 
> This no longer applies against Linus' tree, you'll need to rebase it.

Okay

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V3 02/10] powerpc, perf: Enable conditional branch filter for POWER8

2013-11-26 Thread Anshuman Khandual

On 11/26/2013 11:36 AM, m...@ellerman.id.au wrote:
> On Wed, 2013-16-10 at 06:56:49 UTC, Anshuman Khandual wrote:
>> Enables conditional branch filter support for POWER8
>> utilizing MMCRA register based filter and also invalidates
>> a BHRB branch filter combination involving conditional
>> branches.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  arch/powerpc/perf/power8-pmu.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
>> index 2ee4a70..6e28587 100644
>> --- a/arch/powerpc/perf/power8-pmu.c
>> +++ b/arch/powerpc/perf/power8-pmu.c
>> @@ -580,11 +580,21 @@ static u64 power8_bhrb_filter_map(u64 
>> branch_sample_type)
>>  if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
>>  return -1;
>>  
>> +/* Invalid branch filter combination - HW does not support */
>> +if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
>> +(branch_sample_type & PERF_SAMPLE_BRANCH_COND))
>> +return -1;
> 
> What this doesn't make obvious is that the hardware doesn't support any
> combinations. It just happens that these are the only two possibilities we
> allow, and so this is the only combination we have to disallow.
> 
>>
>>  if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
>>  pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
>>  return pmu_bhrb_filter;
>>  }
>>  
>> +if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
>> +pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
>> +return pmu_bhrb_filter;
>> +}
>> +
>>  /* Every thing else is unsupported */
>>  return -1;
>>  }
> 
> I think it would be clearer if we actually checked for the possibilities we
> allow and let everything else fall through, eg:
> 
>   /* Ignore user/kernel/hv bits */
>   branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
> 
>   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
>   return 0;
> 
>   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
>   return POWER8_MMCRA_IFM1;
> 
>   if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
>   return POWER8_MMCRA_IFM3;
>   
>   return -1;
> 

Please look at the 9th patch (power8, perf: Change BHRB branch filter 
configuration).
All these issues are taken care of in this patch. It clearly indicates that any 
combination
of HW BHRB filters will not be supported in the PMU and hence zero out the HW 
filter component
and processes all of those filters in the SW.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V3 04/10] x86, perf: Add conditional branch filtering support

2013-10-15 Thread Anshuman Khandual

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d5be06a..9723773 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -371,6 +371,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event 
*event)
if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
mask |= X86_BR_NO_TX;
 
+   if (br_type & PERF_SAMPLE_BRANCH_COND)
+   mask |= X86_BR_JCC;
+
/*
 * stash actual user request into reg, it may
 * be used by fixup code for some CPU
@@ -665,6 +668,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 */
[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+   [PERF_SAMPLE_BRANCH_COND] = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -676,6 +680,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
[PERF_SAMPLE_BRANCH_ANY_CALL]   = LBR_REL_CALL | LBR_IND_CALL
| LBR_FAR,
[PERF_SAMPLE_BRANCH_IND_CALL]   = LBR_IND_CALL,
+   [PERF_SAMPLE_BRANCH_COND]   = LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V3 07/10] powerpc, lib: Add new branch instruction analysis support functions

2013-10-15 Thread Anshuman Khandual

Generic powerpc branch instruction analysis support added in the code
patching library which will help the subsequent patch on SW based
filtering of branch records in perf. This patch also converts and
exports some of the existing local static functions through the header
file to be used else where.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/code-patching.h | 30 ++
 arch/powerpc/lib/code-patching.c | 54 ++--
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..8bab417 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -22,6 +22,36 @@
 #define BRANCH_SET_LINK0x1
 #define BRANCH_ABSOLUTE0x2
 
+#define XL_FORM_LR  0x4C20
+#define XL_FORM_CTR 0x4C000420
+#define XL_FORM_TAR 0x4C000460
+
+#define BO_ALWAYS0x0280
+#define BO_CTR   0x0200
+#define BO_CRBI_OFF  0x0080
+#define BO_CRBI_ON   0x0180
+#define BO_CRBI_HINT 0x0040
+
+/* Forms of branch instruction */
+int instr_is_branch_iform(unsigned int instr);
+int instr_is_branch_bform(unsigned int instr);
+int instr_is_branch_xlform(unsigned int instr);
+
+/* Classification of XL-form instruction */
+int is_xlform_lr(unsigned int instr);
+int is_xlform_ctr(unsigned int instr);
+int is_xlform_tar(unsigned int instr);
+
+/* Branch instruction is a call */
+int is_branch_link_set(unsigned int instr);
+
+/* BO field analysis (B-form or XL-form) */
+int is_bo_always(unsigned int instr);
+int is_bo_ctr(unsigned int instr);
+int is_bo_crbi_off(unsigned int instr);
+int is_bo_crbi_on(unsigned int instr);
+int is_bo_crbi_hint(unsigned int instr);
+
 unsigned int create_branch(const unsigned int *addr,
   unsigned long target, int flags);
 unsigned int create_cond_branch(const unsigned int *addr,
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..cb62bd8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -77,16 +77,66 @@ static unsigned int branch_opcode(unsigned int instr)
return (instr >> 26) & 0x3F;
 }
 
-static int instr_is_branch_iform(unsigned int instr)
+int instr_is_branch_iform(unsigned int instr)
 {
return branch_opcode(instr) == 18;
 }
 
-static int instr_is_branch_bform(unsigned int instr)
+int instr_is_branch_bform(unsigned int instr)
 {
return branch_opcode(instr) == 16;
 }
 
+int instr_is_branch_xlform(unsigned int instr)
+{
+   return branch_opcode(instr) == 19;
+}
+
+int is_xlform_lr(unsigned int instr)
+{
+   return (instr & XL_FORM_LR) == XL_FORM_LR;
+}
+
+int is_xlform_ctr(unsigned int instr)
+{
+   return (instr & XL_FORM_CTR) == XL_FORM_CTR;
+}
+
+int is_xlform_tar(unsigned int instr)
+{
+   return (instr & XL_FORM_TAR) == XL_FORM_TAR;
+}
+
+int is_branch_link_set(unsigned int instr)
+{
+   return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK;
+}
+
+int is_bo_always(unsigned int instr)
+{
+   return (instr & BO_ALWAYS) == BO_ALWAYS;
+}
+
+int is_bo_ctr(unsigned int instr)
+{
+   return (instr & BO_CTR) == BO_CTR;
+}
+
+int is_bo_crbi_off(unsigned int instr)
+{
+   return (instr & BO_CRBI_OFF) == BO_CRBI_OFF;
+}
+
+int is_bo_crbi_on(unsigned int instr)
+{
+   return (instr & BO_CRBI_ON) == BO_CRBI_ON;
+}
+
+int is_bo_crbi_hint(unsigned int instr)
+{
+   return (instr & BO_CRBI_HINT) == BO_CRBI_HINT;
+}
+
 int instr_is_relative_branch(unsigned int instr)
 {
if (instr & BRANCH_ABSOLUTE)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V3 02/10] powerpc, perf: Enable conditional branch filter for POWER8

2013-10-15 Thread Anshuman Khandual

Enables conditional branch filter support for POWER8
utilizing MMCRA register based filter and also invalidates
a BHRB branch filter combination involving conditional
branches.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/power8-pmu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 2ee4a70..6e28587 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -580,11 +580,21 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
return -1;
 
+   /* Invalid branch filter combination - HW does not support */
+   if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
+   (branch_sample_type & PERF_SAMPLE_BRANCH_COND))
+   return -1;
+
if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
return pmu_bhrb_filter;
}
 
+   if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
+   pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+   return pmu_bhrb_filter;
+   }
+
/* Every thing else is unsupported */
return -1;
 }
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V3 01/10] perf: New conditional branch filter criteria in branch stack sampling

2013-10-15 Thread Anshuman Khandual

POWER8 PMU based BHRB supports filtering for conditional branches.
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Other architectures can provide
this functionality with either HW filtering support (if present) or
with SW filtering of instructions.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0b1df41..5da52b6 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -160,8 +160,9 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */
+   PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
 
-   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
+   PERF_SAMPLE_BRANCH_MAX  = 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V3 10/10] powerpc, perf: Cleanup SW branch filter list look up

2013-10-16 Thread Anshuman Khandual

This patch adds enumeration for all available SW branch filters
in powerpc book3s code and also streamlines the look for the
SW branch filter entries while trying to figure out which all
branch filters can be supported in SW.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 38 +-
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index f983334..ec2dd61 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -566,6 +566,12 @@ static int match_filters(u64 branch_sample_type, u64 
filter_mask)
return true;
 }
 
+/* SW implemented branch filters */
+static unsigned int power_sw_filter[] =  { 
PERF_SAMPLE_BRANCH_ANY_CALL,
+   PERF_SAMPLE_BRANCH_COND,
+   PERF_SAMPLE_BRANCH_ANY_RETURN,
+   PERF_SAMPLE_BRANCH_IND_CALL };
+
 /*
  * Required SW based branch filters
  *
@@ -578,6 +584,7 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 
pmu_bhrb_filter,
u64 
*filter_mask)
 {
u64 branch_sw_filter = 0;
+   unsigned int i;
 
/* No branch filter requested */
if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
@@ -593,34 +600,15 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 
pmu_bhrb_filter,
 * SW implemented filters. But right now, there is now way to
 * initimate the user about this decision.
 */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
-   if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) {
-   branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
-   *filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL;
-   }
-   }
-
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
-   if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) {
-   branch_sw_filter |= PERF_SAMPLE_BRANCH_COND;
-   *filter_mask |= PERF_SAMPLE_BRANCH_COND;
-   }
-   }
 
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
-   if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) {
-   branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
-   *filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
-   }
-   }
-
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
-   if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) {
-   branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
-   *filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
+   for (i = 0; i < ARRAY_SIZE(power_sw_filter); i++) {
+   if (branch_sample_type & power_sw_filter[i]) {
+   if (!(pmu_bhrb_filter & power_sw_filter[i])) {
+   branch_sw_filter |= power_sw_filter[i];
+   *filter_mask |= power_sw_filter[i];
+   }
}
}
-
return branch_sw_filter;
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V3 05/10] perf, documentation: Description for conditional branch filter

2013-10-16 Thread Anshuman Khandual

Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index e297b74..59ca8d0 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -163,12 +163,13 @@ following filters are defined:
 - any_call: any function call or system call
 - any_ret: any function return or system call return
 - ind_call: any indirect branch
+- cond: conditional branches
 - u:  only when the branch target is at the user level
 - k: only when the branch target is in the kernel
 - hv: only when the target is at the hypervisor level
 
 +
-The option requires at least one branch type among any, any_call, any_ret, 
ind_call.
+The option requires at least one branch type among any, any_call, any_ret, 
ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of 
the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) 
privilege
 levels are subject to permissions.  When sampling on multiple events, branch 
stack sampling
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V3 00/10] perf: New conditional branch filter

2013-10-16 Thread Anshuman Khandual

This patchset is the re-spin of the original branch stack 
sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This 
patchset
also enables SW based branch filtering support for book3s powerpc platforms 
which
have PMU HW backed branch stack sampling support. 

Summary of code changes in this patchset:

(1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
(2) Add the "cond" branch filter options in the "perf record" tool
(3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
(4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
(5) Update the documentation regarding "perf record" tool
(6) Add some new powerpc instruction analysis functions in code-patching library
(7) Enable SW based branch filter support for powerpc book3s
(8) Changed BHRB configuration in POWER8 to accommodate SW branch filters 

With this new SW enablement, the branch filter support for book3s platforms have
been extended to include all these combinations discussed below with a sample 
test
application program (included here).

Changes in V2
=
(1) Enabled PPC64 SW branch filtering support
(2) Incorporated changes required for all previous comments

Changes in V3
=
(1) Split the SW branch filter enablement into multiple patches
(2) Added PMU neutral SW branch filtering code, PMU specific HW branch 
filtering code
(3) Added new instruction analysis functionality into powerpc code-patching 
library
(4) Changed name for some of the functions
(5) Fixed couple of spelling mistakes
(6) Changed code documentation in multiple places

PMU HW branch filters
=
(1) perf record -j any_call -e branch-misses:u ./cprog
# Overhead  Command  Source Shared Object  Source Symbol  Target Shared 
Object Target Symbol
#   ...    .  
  
#
 7.00%cprog  cprog [.] sw_3_1 cprog 
[.] sw_3_1_2
 6.99%cprog  cprog [.] hw_1_1 cprog 
[.] symbol1 
 6.52%cprog  cprog [.] sw_3_1 cprog 
[.] success_3_1_2   
 5.41%cprog  cprog [.] sw_3_1 cprog 
[.] sw_3_1_3
 5.40%cprog  cprog [.] hw_1_2 cprog 
[.] symbol2 
 5.40%cprog  cprog [.] callme cprog 
[.] hw_1_2  
 5.40%cprog  cprog [.] sw_3_1 cprog 
[.] success_3_1_1   
 5.40%cprog  cprog [.] callme cprog 
[.] hw_1_1  
 5.39%cprog  cprog [.] sw_3_1 cprog 
[.] sw_3_1_1
 5.39%cprog  cprog [.] sw_4_2 cprog 
[.] lr_addr 
 5.39%cprog  cprog [.] callme cprog 
[.] sw_4_2  
 5.37%cprog  [unknown] [.]    cprog 
[.] ctr_addr
 4.30%cprog  cprog [.] callme cprog 
[.] hw_2_1  
 4.28%cprog  cprog [.] callme cprog 
[.] sw_3_1  
 3.82%cprog  cprog [.] sw_3_1 cprog 
[.] success_3_1_3   
 3.81%cprog  cprog [.] callme cprog 
[.] hw_2_2  
 3.81%cprog  cprog [.] callme cprog 
[.] sw_3_2  
 2.71%cprog  [unknown] [.]    cprog 
[.] lr_addr 
 2.70%cprog  cprog [.] main   cprog 
[.] callme  
 2.70%cprog  cprog [.] sw_4_1 cprog 
[.] ctr_addr
 2.70%cprog  cprog [.] callme cprog 
[.] sw_4_1  
 0.08%cprog  [unknown] [.] 0xf78676c4 [unknown] 
[.] 0xf78522c0  
 0.02%cprog  [unknown] [k]    cprog 
[k] ctr_addr
 0.01%cprog  [kernel.kallsyms] [.] .power_pmu_enable  
[kernel.kallsyms] [.] .power8_compute_mmcr
 0.00%cprog  ld-2.11.2.so  [.] malloc [unknown] 
[.] 0xf786b380  
 0.00%cprog  ld-2.11.2.so  [.] calloc [unknown] 
[.] 0xf786b390  
 0.00%cprog  cprog [.] main   [unknown] 
[.] 0x1950

[V3 09/10] power8, perf: Change BHRB branch filter configuration

2013-10-16 Thread Anshuman Khandual

Powerpc kernel now supports SW based branch filters for book3s systems with some
specifc requirements while dealing with HW supported branch filters in order to
achieve overall OR semantics prevailing in perf branch stack sampling framework.
This patch adapts the BHRB branch filter configuration to meet those protocols.
POWER8 PMU does support 3 branch filters (out of which two are getting used in
perf branch stack) which are mutually exclussive and cannot be ORed with each
other. This implies that PMU can only handle one HW based branch filter request
at any point of time. For all other combinations PMU will pass it on to the SW.

Also the combination of PERF_SAMPLE_BRANCH_ANY_CALL and PERF_SAMPLE_BRANCH_COND
can now be handled in SW, hence we dont error them out anymore.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/power8-pmu.c | 73 +++---
 1 file changed, 54 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 94460bc..7b82725 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -560,7 +560,56 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
 {
-   u64 pmu_bhrb_filter = 0;
+   u64 x, tmp, pmu_bhrb_filter = 0;
+   *filter_mask = 0;
+
+   /* No branch filter requested */
+   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+   *filter_mask = PERF_SAMPLE_BRANCH_ANY;
+   return pmu_bhrb_filter;
+   }
+
+   /*
+* P8 does not support oring of PMU HW branch filters. Hence
+* if multiple branch filters are requested which includes filters
+* supported in PMU, still go ahead and clear the PMU based HW branch
+* filter component as in this case all the filters will be processed
+* in SW.
+*/
+   tmp = branch_sample_type;
+
+   /* Remove privilege filters before comparison */
+   tmp &= ~PERF_SAMPLE_BRANCH_USER;
+   tmp &= ~PERF_SAMPLE_BRANCH_KERNEL;
+   tmp &= ~PERF_SAMPLE_BRANCH_HV;
+
+   for_each_branch_sample_type(x) {
+   /* Ignore privilege requests */
+   if ((x == PERF_SAMPLE_BRANCH_USER) || (x == 
PERF_SAMPLE_BRANCH_KERNEL) || (x == PERF_SAMPLE_BRANCH_HV))
+   continue;
+
+   if (!(tmp & x))
+   continue;
+
+   /* Supported HW PMU filters */
+   if (tmp & PERF_SAMPLE_BRANCH_ANY_CALL) {
+   tmp &= ~PERF_SAMPLE_BRANCH_ANY_CALL;
+   if (tmp) {
+   pmu_bhrb_filter = 0;
+   *filter_mask = 0;
+   return pmu_bhrb_filter;
+   }
+   }
+
+   if (tmp & PERF_SAMPLE_BRANCH_COND) {
+   tmp &= ~PERF_SAMPLE_BRANCH_COND;
+   if (tmp) {
+   pmu_bhrb_filter = 0;
+   *filter_mask = 0;
+   return pmu_bhrb_filter;
+   }
+   }
+   }
 
/* BHRB and regular PMU events share the same privilege state
 * filter configuration. BHRB is always recorded along with a
@@ -569,34 +618,20 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, 
u64 *filter_mask)
 * PMU event, we ignore any separate BHRB specific request.
 */
 
-   /* No branch filter requested */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
-   return pmu_bhrb_filter;
-
-   /* Invalid branch filter options - HW does not support */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
-   return -1;
-
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
-   return -1;
-
-   /* Invalid branch filter combination - HW does not support */
-   if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
-   (branch_sample_type & PERF_SAMPLE_BRANCH_COND))
-   return -1;
-
+   /* Supported individual branch filters */
if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
+   *filter_mask|= PERF_SAMPLE_BRANCH_ANY_CALL;
return pmu_bhrb_filter;
}
 
if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+   *filter_mask|= PERF_SAMPLE_BRANCH_COND;
return pmu_bhrb_filter;
}
 
-   /* Every thing else is unsupported */
-   return -1;
+   return pmu_bhrb_filter;
 }
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux

[V3 06/10] powerpc, perf: Change the name of HW PMU branch filter tracking variable

2013-10-16 Thread Anshuman Khandual

This patch simply changes the name of the variable from "bhrb_filter" to
"bhrb_hw_filter" in order to add one more variable which will track SW
filters in generic powerpc book3s code which will be implemented in the
subsequent patch.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index eeae308..bc4dac7 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -47,7 +47,7 @@ struct cpu_hw_events {
int n_txn_start;
 
/* BHRB bits */
-   u64 bhrb_filter;/* BHRB HW branch 
filter */
+   u64 bhrb_hw_filter; /* BHRB HW branch 
filter */
int bhrb_users;
void*bhrb_context;
struct  perf_branch_stack   bhrb_stack;
@@ -1159,7 +1159,7 @@ static void power_pmu_enable(struct pmu *pmu)
 
  out:
if (cpuhw->bhrb_users)
-   ppmu->config_bhrb(cpuhw->bhrb_filter);
+   ppmu->config_bhrb(cpuhw->bhrb_hw_filter);
 
local_irq_restore(flags);
 }
@@ -1254,7 +1254,7 @@ nocheck:
  out:
if (has_branch_stack(event)) {
power_pmu_bhrb_enable(event);
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
}
 
@@ -1637,10 +1637,10 @@ static int power_pmu_event_init(struct perf_event 
*event)
err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
if (has_branch_stack(event)) {
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
 
-   if(cpuhw->bhrb_filter == -1)
+   if(cpuhw->bhrb_hw_filter == -1)
return -EOPNOTSUPP;
}
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V3 03/10] perf, tool: Conditional branch filter 'cond' added to perf record

2013-10-16 Thread Anshuman Khandual

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ecca62e..802d11d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -625,6 +625,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL),
BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN),
BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL),
+   BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
BRANCH_END
 };
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V3 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework

2013-10-16 Thread Anshuman Khandual

This patch enables SW based post processing of BHRB captured branches
to be able to meet more user defined branch filtration criteria in perf
branch stack sampling framework. These changes increase the number of
branch filters and their valid combinations on any powerpc64 server
platform with BHRB support. Find the summary of code changes here.

(1) struct cpu_hw_events

Introduced two new variables track various filter values and mask

(a) bhrb_sw_filter  Tracks SW implemented branch filter flags
(b) filter_mask Tracks both (SW and HW) branch filter flags

(2) Event creation

Kernel will figure out supported BHRB branch filters through a PMU call
back 'bhrb_filter_map'. This function will find out how many of the
requested branch filters can be supported in the PMU HW. It will not
try to invalidate any branch filter combinations. Event creation will 
not
error out because of lack of HW based branch filters. Meanwhile it will
track the overall supported branch filters in the "filter_mask" 
variable.

Once the PMU call back returns kernel will process the user branch 
filter
request against available SW filters while looking at the "filter_mask".
During this phase all the branch filters which are still pending from 
the
user requested list will have to be supported in SW failing which the
event creation will error out.

(3) SW branch filter

During the BHRB data capture inside the PMU interrupt context, each
of the captured 'perf_branch_entry.from' will be checked for compliance
with applicable SW branch filters. If the entry does not conform to the
filter requirements, it will be discarded from the final perf branch
stack buffer.

(4) Supported SW based branch filters

(a) PERF_SAMPLE_BRANCH_ANY_RETURN
(b) PERF_SAMPLE_BRANCH_IND_CALL
(c) PERF_SAMPLE_BRANCH_ANY_CALL
(d) PERF_SAMPLE_BRANCH_COND

Please refer patch to understand the classification of instructions into
these branch filter categories.

(5) Multiple branch filter semantics

Book3 sever implementation follows the same OR semantics (as 
implemented in
x86) while dealing with multiple branch filters at any point of time. SW
branch filter analysis is carried on the data set captured in the PMU 
HW.
So the resulting set of data (after applying the SW filters) will 
inherently
be an AND with the HW captured set. Hence any combination of HW and SW 
branch
filters will be invalid. HW based branch filters are more efficient and 
faster
compared to SW implemented branch filters. So at first the PMU should 
decide
whether it can support all the requested branch filters itself or not. 
In case
it can support all the branch filters in an OR manner, we dont apply 
any SW
branch filter on top of the HW captured set (which is the final set). 
This
preserves the OR semantic of multiple branch filters as required. But 
in case
where the PMU cannot support all the requested branch filters in an OR 
manner,
it should not apply any it's filters and leave it upto the SW to handle 
them
all. Its the PMU code's responsibility to uphold this protocol to be 
able to
conform to the overall OR semantic of perf branch stack sampling 
framework.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/perf/core-book3s.c  | 266 ++-
 arch/powerpc/perf/power8-pmu.c   |   2 +-
 3 files changed, 262 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 8b24926..7314085 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -18,6 +18,10 @@
 #define MAX_EVENT_ALTERNATIVES 8
 #define MAX_LIMITED_HWCOUNTERS 2
 
+#define for_each_branch_sample_type(x) \
+for ((x) = PERF_SAMPLE_BRANCH_USER; \
+ (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
+
 /*
  * This struct provides the constants and functions needed to
  * describe the PMU on a particular POWER-family CPU.
@@ -34,7 +38,7 @@ struct power_pmu {
unsigned long *valp);
int (*get_alternatives)(u64 event_id, unsigned int flags,
u64 alt[]);
-   u64 (*bhrb_filter_map)(u64 branch_sample_type);
+   u64 (*bhrb_filter_map)(u64 branch_sample_type, u64 
*filter_mask);
void(*config_bhrb)(u64 pmu_bhrb_filter);
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);
diff --git a/arch/powerpc

Re: [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions

2013-10-16 Thread Anshuman Khandual

On 10/16/2013 01:55 PM, David Laight wrote:
>> Implement instr_is_load_store_2_06() to detect whether a given instruction
>> is one of the fixed-point or floating-point load/store instructions in the
>> POWER Instruction Set Architecture v2.06.
> ...

The op code encoding is dependent on the ISA version ? Does the basic load
and store instructions change with newer ISA versions ? BTW we have got a
newer version for the ISA "PowerISA_V2.07_PUBLIC.pdf" here at power.org

https://www.power.org/documentation/power-isa-version-2-07/

Does not sound like a good idea to analyse the instructions with functions
names which specify ISA version number. Besides, this function does not
belong to specific processor or platform. It has to be bit generic.
 
>> +int instr_is_load_store_2_06(const unsigned int *instr)
>> +{
>> +unsigned int op, upper, lower;
>> +
>> +op = instr_opcode(*instr);
>> +
>> +if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
>> +return true;
>> +
>> +if (op != 31)
>> +return false;
>> +
>> +upper = op >> 5;
>> +lower = op & 0x1f;
>> +
>> +/* Short circuit as many misses as we can */
>> +if (lower < 3 || lower > 23)
>> +return false;
>> +
>> +if (lower == 3) {
>> +if (upper >= 16)
>> +return true;
>> +
>> +return false;
>> +}
>> +
>> +if (lower == 7 || lower == 12)
>> +return true;
>> +
>> +if (lower >= 20) /* && lower <= 23 (implicit) */
>> +return true;
>> +
>> +return false;
>> +}
> 
> I can't help feeling the code could do with some comments about
> which actual instructions are selected where.

Yeah, I agree. At least which category of load-store instructions are
getting selected in each case.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V3 01/10] perf: New conditional branch filter criteria in branch stack sampling

2013-12-03 Thread Anshuman Khandual

On 11/26/2013 03:45 PM, Anshuman Khandual wrote:
> On 11/26/2013 11:36 AM, m...@ellerman.id.au wrote:
>> Ideally your commit subject would contain a verb, preferably in the present
>> tense.
>>
>> I think simply "perf: Add PERF_SAMPLE_BRANCH_COND" would be clearer.
> 
> 
> Sure, will change it.
> 
>>
>> On Wed, 2013-16-10 at 06:56:48 UTC, Anshuman Khandual wrote:
>>> POWER8 PMU based BHRB supports filtering for conditional branches.
>>> This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
>>> will extend the existing perf ABI. Other architectures can provide
>>> this functionality with either HW filtering support (if present) or
>>> with SW filtering of instructions.
>>>
>>> Signed-off-by: Anshuman Khandual 
>>> Reviewed-by: Stephane Eranian 
>>> ---
>>>  include/uapi/linux/perf_event.h | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/perf_event.h 
>>> b/include/uapi/linux/perf_event.h
>>> index 0b1df41..5da52b6 100644
>>> --- a/include/uapi/linux/perf_event.h
>>> +++ b/include/uapi/linux/perf_event.h
>>> @@ -160,8 +160,9 @@ enum perf_branch_sample_type {
>>> PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
>>> PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */
>>> PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */
>>> +   PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
>>>  
>>> -   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
>>> +   PERF_SAMPLE_BRANCH_MAX  = 1U << 11, /* non-ABI */
>>>  };
>>
>> This no longer applies against Linus' tree, you'll need to rebase it.
> 
> Okay

Hey Michael,

Looks like the patch still applies on top of Linus's tree. The modified patch 
with
a new commit subject line can be found here.

--
>From d368096fc51a8da65f2d80ed5090d43cbc269f62 Mon Sep 17 00:00:00 2001
From: Anshuman Khandual 
Date: Mon, 22 Jul 2013 12:22:27 +0530
Subject: [PATCH] perf: Add PERF_SAMPLE_BRANCH_COND

POWER8 PMU based BHRB supports filtering for conditional branches.
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Other architectures can provide
this functionality with either HW filtering support (if present) or
with SW filtering of instructions.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1802d6..e2d8b8b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */
+   PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
 
-   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
+   PERF_SAMPLE_BRANCH_MAX  = 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 05/10] perf, documentation: Description for conditional branch filter

2013-12-04 Thread Anshuman Khandual

Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 43b42c4..5ecc405 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -183,9 +183,10 @@ following filters are defined:
- in_tx: only when the target is in a hardware transaction
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
+   - cond: conditional branches
 
 +
-The option requires at least one branch type among any, any_call, any_ret, 
ind_call.
+The option requires at least one branch type among any, any_call, any_ret, 
ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of 
the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) 
privilege
 levels are subject to permissions.  When sampling on multiple events, branch 
stack sampling
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 01/10] perf: Add PERF_SAMPLE_BRANCH_COND

2013-12-04 Thread Anshuman Khandual

POWER8 PMU based BHRB supports filtering for conditional branches.
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Other architectures can provide
this functionality with either HW filtering support (if present) or
with SW filtering of instructions.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1802d6..e2d8b8b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */
+   PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
 
-   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
+   PERF_SAMPLE_BRANCH_MAX  = 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 10/10] powerpc, perf: Cleanup SW branch filter list look up

2013-12-04 Thread Anshuman Khandual

This patch adds enumeration for all available SW branch filters
in powerpc book3s code and also streamlines the look for the
SW branch filter entries while trying to figure out which all
branch filters can be supported in SW.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 38 +-
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 54d39a5..42c6428 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -566,6 +566,12 @@ static int match_filters(u64 branch_sample_type, u64 
filter_mask)
return true;
 }
 
+/* SW implemented branch filters */
+static unsigned int power_sw_filter[] =  { 
PERF_SAMPLE_BRANCH_ANY_CALL,
+   PERF_SAMPLE_BRANCH_COND,
+   PERF_SAMPLE_BRANCH_ANY_RETURN,
+   PERF_SAMPLE_BRANCH_IND_CALL };
+
 /*
  * Required SW based branch filters
  *
@@ -578,6 +584,7 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 
pmu_bhrb_filter,
u64 
*filter_mask)
 {
u64 branch_sw_filter = 0;
+   unsigned int i;
 
/* No branch filter requested */
if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
@@ -593,34 +600,15 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 
pmu_bhrb_filter,
 * SW implemented filters. But right now, there is now way to
 * initimate the user about this decision.
 */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
-   if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) {
-   branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
-   *filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL;
-   }
-   }
-
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
-   if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) {
-   branch_sw_filter |= PERF_SAMPLE_BRANCH_COND;
-   *filter_mask |= PERF_SAMPLE_BRANCH_COND;
-   }
-   }
 
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
-   if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) {
-   branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
-   *filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
-   }
-   }
-
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
-   if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) {
-   branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
-   *filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
+   for (i = 0; i < ARRAY_SIZE(power_sw_filter); i++) {
+   if (branch_sample_type & power_sw_filter[i]) {
+   if (!(pmu_bhrb_filter & power_sw_filter[i])) {
+   branch_sw_filter |= power_sw_filter[i];
+   *filter_mask |= power_sw_filter[i];
+   }
}
}
-
return branch_sw_filter;
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 00/10] perf: New conditional branch filter

2013-12-04 Thread Anshuman Khandual

This patchset is the re-spin of the original branch stack 
sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This 
patchset
also enables SW based branch filtering support for book3s powerpc platforms 
which
have PMU HW backed branch stack sampling support. 

Summary of code changes in this patchset:

(1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
(2) Add the "cond" branch filter options in the "perf record" tool
(3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
(4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
(5) Update the documentation regarding "perf record" tool
(6) Add some new powerpc instruction analysis functions in code-patching library
(7) Enable SW based branch filter support for powerpc book3s
(8) Changed BHRB configuration in POWER8 to accommodate SW branch filters 

With this new SW enablement, the branch filter support for book3s platforms have
been extended to include all these combinations discussed below with a sample 
test
application program (included here).

Changes in V2
=
(1) Enabled PPC64 SW branch filtering support
(2) Incorporated changes required for all previous comments

Changes in V3
=
(1) Split the SW branch filter enablement into multiple patches
(2) Added PMU neutral SW branch filtering code, PMU specific HW branch 
filtering code
(3) Added new instruction analysis functionality into powerpc code-patching 
library
(4) Changed name for some of the functions
(5) Fixed couple of spelling mistakes
(6) Changed code documentation in multiple places

Changes in V4
=
(1) Changed the commit message for patch (01/10)
(2) Changed the patch (02/10) to accommodate review comments from Michael 
Ellerman
(3) Rebased the patchset against latest Linus's tree

PMU HW branch filters
=
(1) perf record -j any_call -e branch-misses:u ./cprog
# Overhead  Command  Source Shared Object  Source Symbol  Target Shared 
Object Target Symbol
#   ...    .  
  
#
 7.00%cprog  cprog [.] sw_3_1 cprog 
[.] sw_3_1_2
 6.99%cprog  cprog [.] hw_1_1 cprog 
[.] symbol1 
 6.52%cprog  cprog [.] sw_3_1 cprog 
[.] success_3_1_2   
 5.41%cprog  cprog [.] sw_3_1 cprog 
[.] sw_3_1_3
 5.40%cprog  cprog [.] hw_1_2 cprog 
[.] symbol2 
 5.40%cprog  cprog [.] callme cprog 
[.] hw_1_2  
 5.40%cprog  cprog [.] sw_3_1 cprog 
[.] success_3_1_1   
 5.40%cprog  cprog [.] callme cprog 
[.] hw_1_1  
 5.39%cprog  cprog [.] sw_3_1 cprog 
[.] sw_3_1_1
 5.39%cprog  cprog [.] sw_4_2 cprog 
[.] lr_addr 
 5.39%cprog  cprog [.] callme cprog 
[.] sw_4_2  
 5.37%cprog  [unknown] [.]    cprog 
[.] ctr_addr
 4.30%cprog  cprog [.] callme cprog 
[.] hw_2_1  
 4.28%cprog  cprog [.] callme cprog 
[.] sw_3_1  
 3.82%cprog  cprog [.] sw_3_1 cprog 
[.] success_3_1_3   
 3.81%cprog  cprog [.] callme cprog 
[.] hw_2_2  
 3.81%cprog  cprog [.] callme cprog 
[.] sw_3_2  
 2.71%cprog  [unknown] [.]    cprog 
[.] lr_addr 
 2.70%cprog  cprog [.] main   cprog 
[.] callme  
 2.70%cprog  cprog [.] sw_4_1 cprog 
[.] ctr_addr
 2.70%cprog  cprog [.] callme cprog 
[.] sw_4_1  
 0.08%cprog  [unknown] [.] 0xf78676c4 [unknown] 
[.] 0xf78522c0  
 0.02%cprog  [unknown] [k]    cprog 
[k] ctr_addr
 0.01%cprog  [kernel.kallsyms] [.] .power_pmu_enable  
[kernel.kallsyms] [.] .power8_compute_mmcr
 0.00%cprog  ld-2.11.2.so  [.] malloc [unknown] 
[.] 0xf786b380  
 0.00%cp

[PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework

2013-12-04 Thread Anshuman Khandual

This patch enables SW based post processing of BHRB captured branches
to be able to meet more user defined branch filtration criteria in perf
branch stack sampling framework. These changes increase the number of
branch filters and their valid combinations on any powerpc64 server
platform with BHRB support. Find the summary of code changes here.

(1) struct cpu_hw_events

Introduced two new variables track various filter values and mask

(a) bhrb_sw_filter  Tracks SW implemented branch filter flags
(b) filter_mask Tracks both (SW and HW) branch filter flags

(2) Event creation

Kernel will figure out supported BHRB branch filters through a PMU call
back 'bhrb_filter_map'. This function will find out how many of the
requested branch filters can be supported in the PMU HW. It will not
try to invalidate any branch filter combinations. Event creation will 
not
error out because of lack of HW based branch filters. Meanwhile it will
track the overall supported branch filters in the "filter_mask" 
variable.

Once the PMU call back returns kernel will process the user branch 
filter
request against available SW filters while looking at the "filter_mask".
During this phase all the branch filters which are still pending from 
the
user requested list will have to be supported in SW failing which the
event creation will error out.

(3) SW branch filter

During the BHRB data capture inside the PMU interrupt context, each
of the captured 'perf_branch_entry.from' will be checked for compliance
with applicable SW branch filters. If the entry does not conform to the
filter requirements, it will be discarded from the final perf branch
stack buffer.

(4) Supported SW based branch filters

(a) PERF_SAMPLE_BRANCH_ANY_RETURN
(b) PERF_SAMPLE_BRANCH_IND_CALL
(c) PERF_SAMPLE_BRANCH_ANY_CALL
(d) PERF_SAMPLE_BRANCH_COND

Please refer patch to understand the classification of instructions into
these branch filter categories.

(5) Multiple branch filter semantics

Book3 sever implementation follows the same OR semantics (as 
implemented in
x86) while dealing with multiple branch filters at any point of time. SW
branch filter analysis is carried on the data set captured in the PMU 
HW.
So the resulting set of data (after applying the SW filters) will 
inherently
be an AND with the HW captured set. Hence any combination of HW and SW 
branch
filters will be invalid. HW based branch filters are more efficient and 
faster
compared to SW implemented branch filters. So at first the PMU should 
decide
whether it can support all the requested branch filters itself or not. 
In case
it can support all the branch filters in an OR manner, we dont apply 
any SW
branch filter on top of the HW captured set (which is the final set). 
This
preserves the OR semantic of multiple branch filters as required. But 
in case
where the PMU cannot support all the requested branch filters in an OR 
manner,
it should not apply any it's filters and leave it upto the SW to handle 
them
all. Its the PMU code's responsibility to uphold this protocol to be 
able to
conform to the overall OR semantic of perf branch stack sampling 
framework.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/perf/core-book3s.c  | 266 ++-
 arch/powerpc/perf/power8-pmu.c   |   2 +-
 3 files changed, 262 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 3fd2f1b..846d710 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -18,6 +18,10 @@
 #define MAX_EVENT_ALTERNATIVES 8
 #define MAX_LIMITED_HWCOUNTERS 2
 
+#define for_each_branch_sample_type(x) \
+for ((x) = PERF_SAMPLE_BRANCH_USER; \
+ (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
+
 /*
  * This struct provides the constants and functions needed to
  * describe the PMU on a particular POWER-family CPU.
@@ -34,7 +38,7 @@ struct power_pmu {
unsigned long *valp);
int (*get_alternatives)(u64 event_id, unsigned int flags,
u64 alt[]);
-   u64 (*bhrb_filter_map)(u64 branch_sample_type);
+   u64 (*bhrb_filter_map)(u64 branch_sample_type, u64 
*filter_mask);
void(*config_bhrb)(u64 pmu_bhrb_filter);
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);
diff --git a/arch/powerpc

[PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration

2013-12-04 Thread Anshuman Khandual

Powerpc kernel now supports SW based branch filters for book3s systems with some
specifc requirements while dealing with HW supported branch filters in order to
achieve overall OR semantics prevailing in perf branch stack sampling framework.
This patch adapts the BHRB branch filter configuration to meet those protocols.
POWER8 PMU does support 3 branch filters (out of which two are getting used in
perf branch stack) which are mutually exclussive and cannot be ORed with each
other. This implies that PMU can only handle one HW based branch filter request
at any point of time. For all other combinations PMU will pass it on to the SW.

Also the combination of PERF_SAMPLE_BRANCH_ANY_CALL and PERF_SAMPLE_BRANCH_COND
can now be handled in SW, hence we dont error them out anymore.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/power8-pmu.c | 73 +++---
 1 file changed, 54 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 03c5b8d..6021349 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -561,7 +561,56 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
 {
-   u64 pmu_bhrb_filter = 0;
+   u64 x, tmp, pmu_bhrb_filter = 0;
+   *filter_mask = 0;
+
+   /* No branch filter requested */
+   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+   *filter_mask = PERF_SAMPLE_BRANCH_ANY;
+   return pmu_bhrb_filter;
+   }
+
+   /*
+* P8 does not support oring of PMU HW branch filters. Hence
+* if multiple branch filters are requested which includes filters
+* supported in PMU, still go ahead and clear the PMU based HW branch
+* filter component as in this case all the filters will be processed
+* in SW.
+*/
+   tmp = branch_sample_type;
+
+   /* Remove privilege filters before comparison */
+   tmp &= ~PERF_SAMPLE_BRANCH_USER;
+   tmp &= ~PERF_SAMPLE_BRANCH_KERNEL;
+   tmp &= ~PERF_SAMPLE_BRANCH_HV;
+
+   for_each_branch_sample_type(x) {
+   /* Ignore privilege requests */
+   if ((x == PERF_SAMPLE_BRANCH_USER) || (x == 
PERF_SAMPLE_BRANCH_KERNEL) || (x == PERF_SAMPLE_BRANCH_HV))
+   continue;
+
+   if (!(tmp & x))
+   continue;
+
+   /* Supported HW PMU filters */
+   if (tmp & PERF_SAMPLE_BRANCH_ANY_CALL) {
+   tmp &= ~PERF_SAMPLE_BRANCH_ANY_CALL;
+   if (tmp) {
+   pmu_bhrb_filter = 0;
+   *filter_mask = 0;
+   return pmu_bhrb_filter;
+   }
+   }
+
+   if (tmp & PERF_SAMPLE_BRANCH_COND) {
+   tmp &= ~PERF_SAMPLE_BRANCH_COND;
+   if (tmp) {
+   pmu_bhrb_filter = 0;
+   *filter_mask = 0;
+   return pmu_bhrb_filter;
+   }
+   }
+   }
 
/* BHRB and regular PMU events share the same privilege state
 * filter configuration. BHRB is always recorded along with a
@@ -570,34 +619,20 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, 
u64 *filter_mask)
 * PMU event, we ignore any separate BHRB specific request.
 */
 
-   /* No branch filter requested */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
-   return pmu_bhrb_filter;
-
-   /* Invalid branch filter options - HW does not support */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
-   return -1;
-
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
-   return -1;
-
+   /* Supported individual branch filters */
if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
+   *filter_mask|= PERF_SAMPLE_BRANCH_ANY_CALL;
return pmu_bhrb_filter;
}
 
if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+   *filter_mask|= PERF_SAMPLE_BRANCH_COND;
return pmu_bhrb_filter;
}
 
-   /* PMU does not support ANY combination of HW BHRB filters */
-   if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
-   (branch_sample_type & PERF_SAMPLE_BRANCH_COND))
-   return -1;
-
-   /* Every thing else is unsupported */
-   return -1;
+   return pmu_bhrb_filter;
 }
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux

[PATCH V4 06/10] powerpc, perf: Change the name of HW PMU branch filter tracking variable

2013-12-04 Thread Anshuman Khandual

This patch simply changes the name of the variable from "bhrb_filter" to
"bhrb_hw_filter" in order to add one more variable which will track SW
filters in generic powerpc book3s code which will be implemented in the
subsequent patch.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 29b89e8..2de7d48 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -47,7 +47,7 @@ struct cpu_hw_events {
int n_txn_start;
 
/* BHRB bits */
-   u64 bhrb_filter;/* BHRB HW branch 
filter */
+   u64 bhrb_hw_filter; /* BHRB HW branch 
filter */
int bhrb_users;
void*bhrb_context;
struct  perf_branch_stack   bhrb_stack;
@@ -1159,7 +1159,7 @@ static void power_pmu_enable(struct pmu *pmu)
 
  out:
if (cpuhw->bhrb_users)
-   ppmu->config_bhrb(cpuhw->bhrb_filter);
+   ppmu->config_bhrb(cpuhw->bhrb_hw_filter);
 
local_irq_restore(flags);
 }
@@ -1254,7 +1254,7 @@ nocheck:
  out:
if (has_branch_stack(event)) {
power_pmu_bhrb_enable(event);
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
}
 
@@ -1637,10 +1637,10 @@ static int power_pmu_event_init(struct perf_event 
*event)
err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
if (has_branch_stack(event)) {
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
 
-   if(cpuhw->bhrb_filter == -1)
+   if(cpuhw->bhrb_hw_filter == -1)
return -EOPNOTSUPP;
}
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 07/10] powerpc, lib: Add new branch instruction analysis support functions

2013-12-04 Thread Anshuman Khandual

Generic powerpc branch instruction analysis support added in the code
patching library which will help the subsequent patch on SW based
filtering of branch records in perf. This patch also converts and
exports some of the existing local static functions through the header
file to be used else where.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/code-patching.h | 30 ++
 arch/powerpc/lib/code-patching.c | 54 ++--
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..8bab417 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -22,6 +22,36 @@
 #define BRANCH_SET_LINK0x1
 #define BRANCH_ABSOLUTE0x2
 
+#define XL_FORM_LR  0x4C20
+#define XL_FORM_CTR 0x4C000420
+#define XL_FORM_TAR 0x4C000460
+
+#define BO_ALWAYS0x0280
+#define BO_CTR   0x0200
+#define BO_CRBI_OFF  0x0080
+#define BO_CRBI_ON   0x0180
+#define BO_CRBI_HINT 0x0040
+
+/* Forms of branch instruction */
+int instr_is_branch_iform(unsigned int instr);
+int instr_is_branch_bform(unsigned int instr);
+int instr_is_branch_xlform(unsigned int instr);
+
+/* Classification of XL-form instruction */
+int is_xlform_lr(unsigned int instr);
+int is_xlform_ctr(unsigned int instr);
+int is_xlform_tar(unsigned int instr);
+
+/* Branch instruction is a call */
+int is_branch_link_set(unsigned int instr);
+
+/* BO field analysis (B-form or XL-form) */
+int is_bo_always(unsigned int instr);
+int is_bo_ctr(unsigned int instr);
+int is_bo_crbi_off(unsigned int instr);
+int is_bo_crbi_on(unsigned int instr);
+int is_bo_crbi_hint(unsigned int instr);
+
 unsigned int create_branch(const unsigned int *addr,
   unsigned long target, int flags);
 unsigned int create_cond_branch(const unsigned int *addr,
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..cb62bd8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -77,16 +77,66 @@ static unsigned int branch_opcode(unsigned int instr)
return (instr >> 26) & 0x3F;
 }
 
-static int instr_is_branch_iform(unsigned int instr)
+int instr_is_branch_iform(unsigned int instr)
 {
return branch_opcode(instr) == 18;
 }
 
-static int instr_is_branch_bform(unsigned int instr)
+int instr_is_branch_bform(unsigned int instr)
 {
return branch_opcode(instr) == 16;
 }
 
+int instr_is_branch_xlform(unsigned int instr)
+{
+   return branch_opcode(instr) == 19;
+}
+
+int is_xlform_lr(unsigned int instr)
+{
+   return (instr & XL_FORM_LR) == XL_FORM_LR;
+}
+
+int is_xlform_ctr(unsigned int instr)
+{
+   return (instr & XL_FORM_CTR) == XL_FORM_CTR;
+}
+
+int is_xlform_tar(unsigned int instr)
+{
+   return (instr & XL_FORM_TAR) == XL_FORM_TAR;
+}
+
+int is_branch_link_set(unsigned int instr)
+{
+   return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK;
+}
+
+int is_bo_always(unsigned int instr)
+{
+   return (instr & BO_ALWAYS) == BO_ALWAYS;
+}
+
+int is_bo_ctr(unsigned int instr)
+{
+   return (instr & BO_CTR) == BO_CTR;
+}
+
+int is_bo_crbi_off(unsigned int instr)
+{
+   return (instr & BO_CRBI_OFF) == BO_CRBI_OFF;
+}
+
+int is_bo_crbi_on(unsigned int instr)
+{
+   return (instr & BO_CRBI_ON) == BO_CRBI_ON;
+}
+
+int is_bo_crbi_hint(unsigned int instr)
+{
+   return (instr & BO_CRBI_HINT) == BO_CRBI_HINT;
+}
+
 int instr_is_relative_branch(unsigned int instr)
 {
if (instr & BRANCH_ABSOLUTE)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 02/10] powerpc, perf: Enable conditional branch filter for POWER8

2013-12-04 Thread Anshuman Khandual

Enables conditional branch filter support for POWER8
utilizing MMCRA register based filter and also invalidates
any BHRB branch filter combination.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/power8-pmu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index a3f7abd..e88b9cb 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -586,6 +586,16 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
return pmu_bhrb_filter;
}
 
+   if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
+   pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+   return pmu_bhrb_filter;
+   }
+
+   /* PMU does not support ANY combination of HW BHRB filters */
+   if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
+   (branch_sample_type & PERF_SAMPLE_BRANCH_COND))
+   return -1;
+
/* Every thing else is unsupported */
return -1;
 }
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 03/10] perf, tool: Conditional branch filter 'cond' added to perf record

2013-12-04 Thread Anshuman Khandual

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7c8020a..34040f7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX),
BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX),
BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX),
+   BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
BRANCH_END
 };
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 04/10] x86, perf: Add conditional branch filtering support

2013-12-04 Thread Anshuman Khandual

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..9dd2459 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event 
*event)
if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
mask |= X86_BR_NO_TX;
 
+   if (br_type & PERF_SAMPLE_BRANCH_COND)
+   mask |= X86_BR_JCC;
+
/*
 * stash actual user request into reg, it may
 * be used by fixup code for some CPU
@@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 */
[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+   [PERF_SAMPLE_BRANCH_COND] = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
[PERF_SAMPLE_BRANCH_ANY_CALL]   = LBR_REL_CALL | LBR_IND_CALL
| LBR_FAR,
[PERF_SAMPLE_BRANCH_IND_CALL]   = LBR_IND_CALL,
+   [PERF_SAMPLE_BRANCH_COND]   = LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework

2013-12-12 Thread Anshuman Khandual

On 12/10/2013 11:27 AM, Anshuman Khandual wrote:
> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
>> This code was already in need of some unindentation, and now it's just
>> ridiculous.
>>
>> To start with at the beginning of this routine we have:
>>
>> while (..) {
>>  if (!val)
>>  break;
>>  else {
>>  // Bulk of the logic
>>  ...
>>  }
>> }
>>
>> That should almost always become:
>>
>> while (..) {
>>  if (!val)
>>  break;
>>
>>  // Bulk of the logic
>>  ...
>> }
>>
>>
>> But in this case that's not enough. Please send a precursor patch which moves
>> this logic out into a helper function.
> 
> Hey Michael,
> 
> I believe this patch should be able to take care of this.
> 
> commit d66d729715cabe0cfd8e34861a6afa8ad639ddf3
> Author: Anshuman Khandual 
> Date:   Tue Dec 10 11:10:06 2013 +0530
> 
> power, perf: Clean up BHRB processing
> 
> This patch cleans up some indentation problem and re-organizes the
> BHRB processing code with an additional helper function.
> 
> Signed-off-by: Anshuman Khandual 
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 29b89e8..9ae96c5 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -400,11 +400,20 @@ static __u64 power_pmu_bhrb_to(u64 addr)
>   return target - (unsigned long)&instr + addr;
>  }
> 
> +void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, 
> u64 to, int pred)
> +{
> + cpuhw->bhrb_entries[u_index].from = from;
> + cpuhw->bhrb_entries[u_index].to = to;
> + cpuhw->bhrb_entries[u_index].mispred = pred;
> + cpuhw->bhrb_entries[u_index].predicted = ~pred;
> + return;
> +}
> +
>  /* Processing BHRB entries */
>  void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>  {
>   u64 val;
> - u64 addr;
> + u64 addr, tmp;
>   int r_index, u_index, pred;
> 
>   r_index = 0;
> @@ -415,62 +424,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>   if (!val)
>   /* Terminal marker: End of valid BHRB entries */
>   break;
> - else {
> - addr = val & BHRB_EA;
> - pred = val & BHRB_PREDICTION;
> 
> - if (!addr)
> - /* invalid entry */
> - continue;
> + addr = val & BHRB_EA;
> + pred = val & BHRB_PREDICTION;
> 
> - /* Branches are read most recent first (ie. mfbhrb 0 is
> -  * the most recent branch).
> -  * There are two types of valid entries:
> -  * 1) a target entry which is the to address of a
> -  *computed goto like a blr,bctr,btar.  The next
> -  *entry read from the bhrb will be branch
> -  *corresponding to this target (ie. the actual
> -  *blr/bctr/btar instruction).
> -  * 2) a from address which is an actual branch.  If a
> -  *target entry proceeds this, then this is the
> -  *matching branch for that target.  If this is not
> -  *following a target entry, then this is a branch
> -  *where the target is given as an immediate field
> -  *in the instruction (ie. an i or b form branch).
> -  *In this case we need to read the instruction from
> -  *memory to determine the target/to address.
> + if (!addr)
> + /* invalid entry */
> + continue;
> +
> + /* Branches are read most recent first (ie. mfbhrb 0 is
> +  * the most recent branch).
> +  * There are two types of valid entries:
> +  * 1) a target entry which is the to address of a
> +  *computed goto like a blr,bctr,btar.  The next
> +  *entry read from the bhrb will be branch
> +  *corresponding to this target (ie. the actual
> +  *blr/bctr/btar instruction).
> +  * 2) a from address which is an actual branch.  If a
> +  *target entry proceeds this, then this is the
> +  *matching branch for that target.  If this is

Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration

2013-12-13 Thread Anshuman Khandual

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> 
> As I said in my comments on version 3 which you ignored:
> 
> I think it would be clearer if we actually checked for the possibilities 
> we
> allow and let everything else fall through, eg:
> 
> Â Â Â Â Â Â Â Â /* Ignore user/kernel/hv bits */
> Â Â Â Â Â Â Â Â branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
> 
> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return 0;
> 
> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM1;
> Â 
> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM3;
> Â Â Â Â Â Â Â Â 
> Â Â Â Â Â Â Â Â return -1;
> 

Hey Michael,

This patch only adds support for the PERF_SAMPLE_BRANCH_COND filter, if the
over all code flow does not clearly suggest that all combinations of any of
these HW filters are invalid, then we can go with one more patch to clean
that up before or after this patch but not here in this patch. Finally the
code section here will look something like this. Does it sound good ?

static u64 power8_bhrb_filter_map(u64 branch_sample_type)
{
u64 pmu_bhrb_filter = 0;

/* BHRB and regular PMU events share the same privilege state
 * filter configuration. BHRB is always recorded along with a
 * regular PMU event. As the privilege state filter is handled
 * in the basic PMC configuration of the accompanying regular
 * PMU event, we ignore any separate BHRB specific request.
 */

/* Ignore user, kernel, hv bits */
branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;

if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
return pmu_bhrb_filter;


if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
return pmu_bhrb_filter;
}

if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) {
pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
return pmu_bhrb_filter;
}

/* Every thing else is unsupported */
return -1;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes

2014-07-18 Thread Anshuman Khandual

On 07/18/2014 04:53 AM, Sam Bobroff wrote:
> On 17/07/14 21:14, Michael Neuling wrote:
>>
>> On Jul 17, 2014 9:11 PM, "Benjamin Herrenschmidt"
>> mailto:b...@kernel.crashing.org>> wrote:
>>>
>
>> Outstanding Issues
>> ==
>> (1) Running DSCR register value inside a transaction does not
>> seem to be saved
>> at thread.dscr when the process stops for ptrace examination.
>
> Hey Ben,
>
> Any updates on this patch series ?

 Ben,

 Any updates on this patch series ?
>>>
>>> I haven't had a chance to review yet, I was hoping somebody else would..
>>>
>>> Have you made any progress vs. the DSCR outstanding issue mentioned
>>> above ?
>>
>> The DSCR issue should be resolved with Sam Bobroff's recent  DSCR
>> fixes.  I've not tested them though.
>>
>> Actually... Sam did you review this series?
>>
>> Mikey
>>
> 
> I did, and applying "powerpc: Correct DSCR during TM context switch"
> corrected the DSCR value in the test program (the one in the patch notes
> for this series).
> 
> (In fact, IIRC, the reason for my patch set was the bug exposed by this
> one ;-)

Yeah the test program worked correctly with the fix from Sam. The first patch
is a generic code change which Pedro had reviewed before. The second and third
patches are powerpc specific.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes

2014-09-10 Thread Anshuman Khandual

On 05/23/2014 08:45 PM, Anshuman Khandual wrote:
>   This patch series adds five new ELF core note sections which can be
> used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing
> various transactional memory and miscellaneous register sets on PowerPC
> platform. Please find a test program exploiting these new ELF core note
> types on a POWER8 system.
> 
> RFC: https://lkml.org/lkml/2014/4/1/292
> V1:  https://lkml.org/lkml/2014/4/2/43
> V2:  https://lkml.org/lkml/2014/5/5/88
> 
> Changes in V3
> =
> (1) Added two new error paths in every TM related get/set functions when 
> regset
> support is not present on the system (ENODEV) or when the process does not
> have any transaction active (ENODATA) in the context
> 
> (2) Installed the active hooks for all the newly added regset core note types
> 
> Changes in V2
> =
> (1) Removed all the power specific ptrace requests corresponding to new 
> NT_PPC_*
> elf core note types. Now all the register sets can be accessed from ptrace
> through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* 
> core
> note type instead
> (2) Fixed couple of attribute values for REGSET_TM_CGPR register set
> (3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread
> (4) Fixed 32 bit checkpointed GPR support
> (5) Changed commit messages accordingly
> 
> Outstanding Issues
> ==
> (1) Running DSCR register value inside a transaction does not seem to be saved
> at thread.dscr when the process stops for ptrace examination.

Hey Sam and Suka,

Thanks for reviewing this patch series. I was busy with some other work
for last couple of months. Went through your comments, will get back to
this patch series in some time and work on the comments.

Thanks again.

Regards
Anshuman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC/RFT v3 6/9] powerpc: move cacheinfo sysfs to generic cacheinfo infrastructure

2014-03-06 Thread Anshuman Khandual

On 02/19/2014 09:36 PM, Sudeep Holla wrote:
> From: Sudeep Holla 
> 
> This patch removes the redundant sysfs cacheinfo code by making use of
> the newly introduced generic cacheinfo infrastructure.
> 
> Signed-off-by: Sudeep Holla 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: linuxppc-...@lists.ozlabs.org
> ---
>  arch/powerpc/kernel/cacheinfo.c | 831 
> ++--
>  arch/powerpc/kernel/cacheinfo.h |   8 -
>  arch/powerpc/kernel/sysfs.c |   4 -
>  3 files changed, 109 insertions(+), 734 deletions(-)
>  delete mode 100644 arch/powerpc/kernel/cacheinfo.h
> 
> diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
> index 2912b87..05b7580 100644
> --- a/arch/powerpc/kernel/cacheinfo.c
> +++ b/arch/powerpc/kernel/cacheinfo.c
> @@ -10,38 +10,10 @@
>   * 2 as published by the Free Software Foundation.
>   */
> 
> +#include 
>  #include 
> -#include 
>  #include 
> -#include 
> -#include 
> -#include 
>  #include 
> -#include 
> -#include 
> -#include 
> -
> -#include "cacheinfo.h"
> -
> -/* per-cpu object for tracking:
> - * - a "cache" kobject for the top-level directory
> - * - a list of "index" objects representing the cpu's local cache hierarchy
> - */
> -struct cache_dir {
> - struct kobject *kobj; /* bare (not embedded) kobject for cache
> -* directory */
> - struct cache_index_dir *index; /* list of index objects */
> -};
> -
> -/* "index" object: each cpu's cache directory has an index
> - * subdirectory corresponding to a cache object associated with the
> - * cpu.  This object's lifetime is managed via the embedded kobject.
> - */
> -struct cache_index_dir {
> - struct kobject kobj;
> - struct cache_index_dir *next; /* next index in parent directory */
> - struct cache *cache;
> -};
> 
>  /* Template for determining which OF properties to query for a given
>   * cache type */
> @@ -60,11 +32,6 @@ struct cache_type_info {
>   const char *nr_sets_prop;
>  };
> 
> -/* These are used to index the cache_type_info array. */
> -#define CACHE_TYPE_UNIFIED 0
> -#define CACHE_TYPE_INSTRUCTION 1
> -#define CACHE_TYPE_DATA2
> -
>  static const struct cache_type_info cache_type_info[] = {
>   {
>   /* PowerPC Processor binding says the [di]-cache-*
> @@ -77,246 +44,115 @@ static const struct cache_type_info cache_type_info[] = 
> {
>   .nr_sets_prop= "d-cache-sets",
>   },
>   {
> - .name= "Instruction",
> - .size_prop   = "i-cache-size",
> - .line_size_props = { "i-cache-line-size",
> -  "i-cache-block-size", },
> - .nr_sets_prop= "i-cache-sets",
> - },
> - {
>   .name= "Data",
>   .size_prop   = "d-cache-size",
>   .line_size_props = { "d-cache-line-size",
>"d-cache-block-size", },
>   .nr_sets_prop= "d-cache-sets",
>   },
> + {
> + .name= "Instruction",
> + .size_prop   = "i-cache-size",
> + .line_size_props = { "i-cache-line-size",
> +  "i-cache-block-size", },
> + .nr_sets_prop= "i-cache-sets",
> + },
>  };


Hey Sudeep,

After applying this patch, the cache_type_info array looks like this.

static const struct cache_type_info cache_type_info[] = {
{
/* 
 * PowerPC Processor binding says the [di]-cache-*
 * must be equal on unified caches, so just use
 * d-cache properties.
 */
.name= "Unified",
.size_prop   = "d-cache-size",
.line_size_props = { "d-cache-line-size",
 "d-cache-block-size", },
.nr_sets_prop= "d-cache-sets",
},
{
.name= "Data",
.size_prop   = "d-cache-size",
.line_size_props = { "d-cache-line-size",
 "d-cache-block-size", },
.nr_sets_prop= "d-cache-sets",
},
{
.name= "Instruction",
.size_prop   = "i-cache-size",
.line_size_props = { "i-cache-line-size",
 "i-cache-block-size", },
.nr_sets_prop= "i-cache-sets",
},
};

and this function computes the the array index for any given cache type
define for PowerPC.

static inline int get_cacheinfo_idx(enum cache_type type)
{
if (type == CACHE_TYPE_UNIFIED)
return 0;
else
return type;
}

These types are define in include/linux/cacheinfo.h as

enum cache_type {
CACHE_TYPE_NOCACHE = 0,
CACHE_TYPE_INST = BIT(0),

Re: [PATCH RFC/RFT v3 6/9] powerpc: move cacheinfo sysfs to generic cacheinfo infrastructure

2014-03-06 Thread Anshuman Khandual

On 03/07/2014 09:36 AM, Anshuman Khandual wrote:
> On 02/19/2014 09:36 PM, Sudeep Holla wrote:
>> From: Sudeep Holla 
>>
>> This patch removes the redundant sysfs cacheinfo code by making use of
>> the newly introduced generic cacheinfo infrastructure.
>>
>> Signed-off-by: Sudeep Holla 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: linuxppc-...@lists.ozlabs.org
>> ---
>>  arch/powerpc/kernel/cacheinfo.c | 831 
>> ++--
>>  arch/powerpc/kernel/cacheinfo.h |   8 -
>>  arch/powerpc/kernel/sysfs.c |   4 -
>>  3 files changed, 109 insertions(+), 734 deletions(-)
>>  delete mode 100644 arch/powerpc/kernel/cacheinfo.h
>>
>> diff --git a/arch/powerpc/kernel/cacheinfo.c 
>> b/arch/powerpc/kernel/cacheinfo.c
>> index 2912b87..05b7580 100644
>> --- a/arch/powerpc/kernel/cacheinfo.c
>> +++ b/arch/powerpc/kernel/cacheinfo.c
>> @@ -10,38 +10,10 @@
>>   * 2 as published by the Free Software Foundation.
>>   */
>>
>> +#include 
>>  #include 
>> -#include 
>>  #include 
>> -#include 
>> -#include 
>> -#include 
>>  #include 
>> -#include 
>> -#include 
>> -#include 
>> -
>> -#include "cacheinfo.h"
>> -
>> -/* per-cpu object for tracking:
>> - * - a "cache" kobject for the top-level directory
>> - * - a list of "index" objects representing the cpu's local cache hierarchy
>> - */
>> -struct cache_dir {
>> -struct kobject *kobj; /* bare (not embedded) kobject for cache
>> -   * directory */
>> -struct cache_index_dir *index; /* list of index objects */
>> -};
>> -
>> -/* "index" object: each cpu's cache directory has an index
>> - * subdirectory corresponding to a cache object associated with the
>> - * cpu.  This object's lifetime is managed via the embedded kobject.
>> - */
>> -struct cache_index_dir {
>> -struct kobject kobj;
>> -struct cache_index_dir *next; /* next index in parent directory */
>> -struct cache *cache;
>> -};
>>
>>  /* Template for determining which OF properties to query for a given
>>   * cache type */
>> @@ -60,11 +32,6 @@ struct cache_type_info {
>>  const char *nr_sets_prop;
>>  };
>>
>> -/* These are used to index the cache_type_info array. */
>> -#define CACHE_TYPE_UNIFIED 0
>> -#define CACHE_TYPE_INSTRUCTION 1
>> -#define CACHE_TYPE_DATA2
>> -
>>  static const struct cache_type_info cache_type_info[] = {
>>  {
>>  /* PowerPC Processor binding says the [di]-cache-*
>> @@ -77,246 +44,115 @@ static const struct cache_type_info cache_type_info[] 
>> = {
>>  .nr_sets_prop= "d-cache-sets",
>>  },
>>  {
>> -.name= "Instruction",
>> -.size_prop   = "i-cache-size",
>> -.line_size_props = { "i-cache-line-size",
>> - "i-cache-block-size", },
>> -.nr_sets_prop= "i-cache-sets",
>> -},
>> -{
>>  .name= "Data",
>>  .size_prop   = "d-cache-size",
>>  .line_size_props = { "d-cache-line-size",
>>   "d-cache-block-size", },
>>  .nr_sets_prop= "d-cache-sets",
>>  },
>> +{
>> +.name= "Instruction",
>> +.size_prop   = "i-cache-size",
>> +.line_size_props = { "i-cache-line-size",
>> + "i-cache-block-size", },
>> +.nr_sets_prop= "i-cache-sets",
>> +},
>>  };
> 
> 
> Hey Sudeep,
> 
> After applying this patch, the cache_type_info array looks like this.
> 
> static const struct cache_type_info cache_type_info[] = {
> {
> /* 
>  * PowerPC Processor binding says the [di]-cache-*
>  * must be equal on unified caches, so just use
>  * d-cache properties.
>  */
> .name= "Unified",
> .size_prop   = "d-cache-size",
> .line_size_props = { "d-cache-line-size",
>  "d-cache-block-size", },
&

[V5 2/4] perf, tool: Conditional branch filter 'cond' added to perf record

2014-03-07 Thread Anshuman Khandual

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 3c394bf..eb74bcd 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -589,6 +589,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX),
BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX),
BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX),
+   BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
BRANCH_END
 };
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V5 1/4] perf: Add PERF_SAMPLE_BRANCH_COND

2014-03-07 Thread Anshuman Khandual

This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Various architectures can provide
this functionality with either with HW filtering support (if present)
or with SW filtering of captured branch instructions.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 853bc1c..696f69b4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */
+   PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
 
-   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
+   PERF_SAMPLE_BRANCH_MAX  = 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V5 0/4] perf: New conditional branch filter

2014-03-07 Thread Anshuman Khandual

Hello Arnaldo,

I had posted the V5 version of PowerPC SW branch filter enablement
patchset last month. Please find the patchset here at

https://lkml.org/lkml/2014/2/5/79

These following patches (2,4,5,6 patches from the original V5 version patchset)
are the ones which change code in the generic kernel, perf tool and X86 perf.
Basically this patchset adds one more branch filter for "conditional" branches.
In X86 code, this new filter has been implemented with the help of availble SW
filter X86_BR_JCC and LBR_JCC. We had some discussions in this regard before.
Please review these changes and if it's okay, please merge them. Other patches
in the series are powerpc specific and are being reviewed by Benjamin 
Herrenschmidt
and Michael Ellerman. Let me know if you need more information.

[1] https://lkml.org/lkml/2013/5/22/51
[2] https://lkml.org/lkml/2013/8/30/10
[3] https://lkml.org/lkml/2013/10/16/75
[4] https://lkml.org/lkml/2013/12/4/168
[5] https://lkml.org/lkml/2014/2/5/79

c: Arnaldo Carvalho de Melo 
Cc: Stephane Eranian 
Cc: Andi Kleen 
Cc: Ingo Molnar 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 

Anshuman Khandual (4):
  perf: Add PERF_SAMPLE_BRANCH_COND
  perf, tool: Conditional branch filter 'cond' added to perf record
  x86, perf: Add conditional branch filtering support
  perf, documentation: Description for conditional branch filter

 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +
 include/uapi/linux/perf_event.h| 3 ++-
 tools/perf/Documentation/perf-record.txt   | 3 ++-
 tools/perf/builtin-record.c| 1 +
 4 files changed, 10 insertions(+), 2 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V5 4/4] perf, documentation: Description for conditional branch filter

2014-03-07 Thread Anshuman Khandual

Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index c71b0f3..d460049 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -184,9 +184,10 @@ following filters are defined:
- in_tx: only when the target is in a hardware transaction
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
+   - cond: conditional branches
 
 +
-The option requires at least one branch type among any, any_call, any_ret, 
ind_call.
+The option requires at least one branch type among any, any_call, any_ret, 
ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of 
the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) 
privilege
 levels are subject to permissions.  When sampling on multiple events, branch 
stack sampling
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V5 3/4] x86, perf: Add conditional branch filtering support

2014-03-07 Thread Anshuman Khandual

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..9dd2459 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event 
*event)
if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
mask |= X86_BR_NO_TX;
 
+   if (br_type & PERF_SAMPLE_BRANCH_COND)
+   mask |= X86_BR_JCC;
+
/*
 * stash actual user request into reg, it may
 * be used by fixup code for some CPU
@@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 */
[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+   [PERF_SAMPLE_BRANCH_COND] = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
[PERF_SAMPLE_BRANCH_ANY_CALL]   = LBR_REL_CALL | LBR_IND_CALL
| LBR_FAR,
[PERF_SAMPLE_BRANCH_IND_CALL]   = LBR_IND_CALL,
+   [PERF_SAMPLE_BRANCH_COND]   = LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] powerpc, ptrace: Add new ptrace request macros for transactional memory

2014-04-28 Thread Anshuman Khandual

On 04/26/2014 05:12 AM, Pedro Alves wrote:
> On 04/02/2014 08:02 AM, Anshuman Khandual wrote:
>> This patch adds following new sets of ptrace request macros for transactional
>> memory expanding the existing ptrace ABI on PowerPC.
>>
>>  /* TM special purpose registers */
>>  PTRACE_GETTM_SPRREGS
>>  PTRACE_SETTM_SPRREGS
>>
>>  /* TM checkpointed GPR registers */
>>  PTRACE_GETTM_CGPRREGS
>>  PTRACE_SETTM_CGPRREGS
>>
>>  /* TM checkpointed FPR registers */
>>  PTRACE_GETTM_CFPRREGS
>>  PTRACE_SETTM_CFPRREGS
>>
>>  /* TM checkpointed VMX registers */
>>  PTRACE_GETTM_CVMXREGS
>>  PTRACE_SETTM_CVMXREGS
> 
> Urgh, we're _still_ adding specialized register specific calls?
> Why aren't these exported as new register sets, accessible through
> PTRACE_GETREGSET /  PTRACE_SETREGSET?  That's supposed to be the
> Modern Way to do things.

All these new register sets can be accessed through PTRACE_GETREGSET
/SETREGSET requests with the new NT_PPC_* core note types added in the
previous patch. PowerPC already has some register specific ptrace
requests, so thought of adding some new requests for transactional
memory purpose. But yes these are redundant and can be dropped.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ptrace: Fix PTRACE_GETREGSET/PTRACE_SETREGSET in code documentation

2014-04-28 Thread Anshuman Khandual

The current documentation is bit misleading and does not explicitly
specify that iov.len need to be initialized failing which kernel
may just ignore the ptrace request and never read from/write into
the user specified buffer. This patch fixes the documentation.

Signed-off-by: Anshuman Khandual 
---
 include/uapi/linux/ptrace.h | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index cf1019e..e9d6b37 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -43,8 +43,12 @@
  *
  * ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);
  *
- * On the successful completion, iov.len will be updated by the kernel,
- * specifying how much the kernel has written/read to/from the user's iov.buf.
+ * A non-zero value upto the max size of data expected to be written/read by 
the
+ * kernel in response to any NT_XXX_TYPE request type must be assigned to 
iov.len
+ * before initiating the ptrace call. If iov.len is 0, then kernel will neither
+ * read from or write into the user buffer specified. On successful completion,
+ * iov.len will be updated by the kernel, specifying how much the kernel has
+ * written/read to/from the user's iov.buf.
  */
 #define PTRACE_GETREGSET   0x4204
 #define PTRACE_SETREGSET   0x4205
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Add new ptrace request macros on PowerPC

2014-04-29 Thread Anshuman Khandual

On 04/02/2014 03:02 PM, Anshuman Khandual wrote:
> On 04/02/2014 12:32 PM, Anshuman Khandual wrote:
>>  This patch series adds new ELF note sections which are used to
>> create new ptrace request macros for various transactional memory and
>> miscellaneous registers on PowerPC. Please find the test case exploiting
>> the new ptrace request macros and it's results on a POWER8 system.
>>
>> RFC: https://lkml.org/lkml/2014/4/1/292
>>
>> == Results ==
>> ---TM specific SPR--
>> TM TFHAR: 19dc
>> TM TEXASR: de01ac01
>> TM TFIAR: c003f386
>> TM CH ORIG_MSR: 9005f032
>> TM CH TAR: 6
>> TM CH PPR: c
>> TM CH DSCR: 1
>> ---TM checkpointed GPR-
>> TM CH GPR[0]: 197c
>> TM CH GPR[1]: 5
>> TM CH GPR[2]: 6
>> TM CH GPR[7]: 1
>> TM CH NIP: 19dc
>> TM CH LINK: 197c
>> TM CH CCR: 22000422
>> ---TM running GPR-
>> TM RN GPR[0]: 197c
>> TM RN GPR[1]: 7
>> TM RN GPR[2]: 8
>> TM RN GPR[7]: 5
>> TM RN NIP: 19fc
>> TM RN LINK: 197c
>> TM RN CCR: 2000422
>> ---TM running FPR-
>> TM RN FPR[0]: 1002d3a3780
>> TM RN FPR[1]: 7
>> TM RN FPR[2]: 8
>> TM RN FPSCR: 0
>> ---TM checkpointed FPR-
>> TM CH FPR[0]: 1002d3a3780
>> TM CH FPR[1]: 5
>> TM CH FPR[2]: 6
>> TM CH FPSCR: 0
>> ---Running miscellaneous registers---
> TM RN DSCR: 0
> 
> There is a problem in here which I forgot to mention. The running DSCR value
> comes from thread->dscr component of the target process. While we are inside 
> the
> transaction (which is the case here as we are stuck at "b ." instruction and
> have not reached TEND) thread->dscr should have the running value of the DSCR
> register at that point of time. Here we expect the DSCR value to be 5 instead
> of 0 as shown in the output above. During the tests when I moved the "b ." 
> after
> TEND, the thread->dscr gets the value of 5 while all check pointed reg values 
> are
> thrown away. I believe there is some problem in the way thread->dscr context
> is saved away inside the TM section. Will look into this problem further and
> keep informed.

Reason behind this inconsistent DSCR register value is because of the following 
commit
where the kernel reverts the DSCR register into a default value to avoid 
running with
the user set value for a long time, thus preventing any potential performance 
degradation.
Same reason applies to the PPR register as well. So its not a problem but an 
expected
behaviour.

commit e9bdc3d6143d1c4b8d8ce5231fc958268331f983
Author: Michael Neuling 
Date:   Thu Sep 26 13:29:09 2013 +1000

powerpc/tm: Switch out userspace PPR and DSCR sooner

When we do a treclaim or trecheckpoint we end up running with userspace
PPR and DSCR values.  Currently we don't do anything special to avoid
running with user values which could cause a severe performance
degradation.

This patch moves the PPR and DSCR save and restore around treclaim and
trecheckpoint so that we run with user values for a much shorter period.
More care is taken with the PPR as it's impact is greater than the DSCR.

This is similar to user exceptions, where we run HTM_MEDIUM early to
ensure that we don't run with a userspace PPR values in the kernel.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Add new ptrace request macros on PowerPC

2014-04-29 Thread Anshuman Khandual

On 04/29/2014 12:36 PM, Michael Neuling wrote:
> How is it causing the problem?

As mentioned before, what I thought to be a problem is
something expected behaviour. So it's not a problem any
more. DSCR value inside the transaction will fall back
to default as kernel wont let user specified value to
remain applied for a long time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Add new ptrace request macros on PowerPC

2014-04-29 Thread Anshuman Khandual

On 04/29/2014 01:52 PM, Michael Neuling wrote:
> That's not what that patch does. It shouldn't make any user visible changes
> to DSCR or PPR.

It may not when it runs uninterrupted but after the tracee process has stopped,
thread.dscr reflects the default DSCR value as mentioned before. This can be
proved by changing the "dscr_default" value in arch/powerpc/sysfs.c file.

> 
> Over syscall PPR and DSCR may change. Depending on your test case, that may
> be your problem.

I would guess when the tracee process stops for ptrace analysis, tm_reclaim or
tm_recheckpoint path might be crossed which is causing this dscr_default value
to go into thread_struct.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-19 Thread Anshuman Khandual

On 05/15/2014 05:38 PM, Pedro Alves wrote:
> On 05/15/2014 09:25 AM, Anshuman Khandual wrote:
>> On 05/14/2014 04:45 PM, Pedro Alves wrote:
>>> On 05/14/14 06:46, Anshuman Khandual wrote:
>>>> On 05/13/2014 10:43 PM, Pedro Alves wrote:
>>>>> On 05/05/14 08:54, Anshuman Khandual wrote:
>>>>>> This patch enables get and set of transactional memory related register
>>>>>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
>>>>>> four new powerpc specific register sets i.e REGSET_TM_SPR, 
>>>>>> REGSET_TM_CGPR,
>>>>>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
>>>>>> ELF core note types added previously in this regard.
>>>>>>
>>>>>>  (1) NT_PPC_TM_SPR
>>>>>>  (2) NT_PPC_TM_CGPR
>>>>>>  (3) NT_PPC_TM_CFPR
>>>>>>  (4) NT_PPC_TM_CVMX
>>>>>
>>>>> Sorry that I couldn't tell this from the code, but, what does the
>>>>> kernel return when the ptracer requests these registers and the
>>>>> program is not in a transaction?  Specifically I'm wondering whether
>>>>> this follows the same semantics as the s390 port.
>>>>>
>>>>
>>>> Right now, it still returns the saved state of the registers from thread
>>>> struct. I had assumed that the user must know the state of the transaction
>>>> before initiating the ptrace request. I guess its better to check for
>>>> the transaction status before processing the request. In case if TM is not
>>>> active on that thread, we should return -EINVAL.
>>>
>>> I think s390 returns ENODATA in that case.
>>>
>>>  https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html
>>>
>>> We'll want some way to tell whether the system actually
>>> supports this.  That could be ENODATA vs something-else (EINVAL
>>> or perhaps better EIO for "request is invalid").
>>
>> As Mickey has pointed out, the transaction memory support in the system can 
>> be
>> checked from the HWCAP2 flags. So when the transaction is not active, we will
>> return ENODATA instead for TM related ptrace regset requests.
> 
> Returning ENODATA when the transaction is not active, like
> s390 is great.  Thank you.
> 
> But I think it's worth it to consider what should the kernel
> return when the machine doesn't have these registers at all.
> 
> Sure, for this case we happen to have the hwcap flag.  But in
> general, I don't know whether we will always have a hwcap bit
> for each register set that is added.  Maybe we will, so that
> the info ends up in core dumps.
> 
> Still, I think it's worth to consider this case in the
> general sense, irrespective of hwcap.
> 
> That is, what should PTRACE_GETREGSET/PTRACE_SETREGSET return
> when the machine doesn't have the registers at all.  We shouldn't
> need to consult something elsewhere (like hwcap) to determine
> what ENODATA means.  The kernel knows it right there.  I think
> s390 goofed here.
> 
> Taking a look at x86, for example, we see:
> 
>   [REGSET_XSTATE] = {
>   .core_note_type = NT_X86_XSTATE,
>   .size = sizeof(u64), .align = sizeof(u64),
>   .active = xstateregs_active, .get = xstateregs_get,
>   .set = xstateregs_set
>   },
> 
> Note that it installs the ".active" hook.
> 
>  24 /**
>  25  * user_regset_active_fn - type of @active function in &struct user_regset
>  26  * @target: thread being examined
>  27  * @regset: regset being examined
>  28  *
>  29  * Return -%ENODEV if not available on the hardware found.
>  30  * Return %0 if no interesting state in this thread.
>  31  * Return >%0 number of @size units of interesting state.
>  32  * Any get call fetching state beyond that number will
>  33  * see the default initialization state for this data,
>  34  * so a caller that knows what the default state is need
>  35  * not copy it all out.
>  36  * This call is optional; the pointer is %NULL if there
>  37  * is no inexpensive check to yield a value < @n.
>  38  */
>  39 typedef int user_regset_active_fn(struct task_struct *target,
>  40   const struct user_regset *regset);
>  41
> 
> Note the mention of ENODEV.
> 
> I couldn't actually find any arch that currently returns -ENODEV in
> the "active" hook.  I see that binfmt_elf.c doesn't handle
> regset->active

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-19 Thread Anshuman Khandual

On 05/15/2014 05:38 PM, Pedro Alves wrote:
> On 05/15/2014 09:25 AM, Anshuman Khandual wrote:
>> On 05/14/2014 04:45 PM, Pedro Alves wrote:
>>> On 05/14/14 06:46, Anshuman Khandual wrote:
>>>> On 05/13/2014 10:43 PM, Pedro Alves wrote:
>>>>> On 05/05/14 08:54, Anshuman Khandual wrote:
>>>>>> This patch enables get and set of transactional memory related register
>>>>>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
>>>>>> four new powerpc specific register sets i.e REGSET_TM_SPR, 
>>>>>> REGSET_TM_CGPR,
>>>>>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
>>>>>> ELF core note types added previously in this regard.
>>>>>>
>>>>>>  (1) NT_PPC_TM_SPR
>>>>>>  (2) NT_PPC_TM_CGPR
>>>>>>  (3) NT_PPC_TM_CFPR
>>>>>>  (4) NT_PPC_TM_CVMX
>>>>>
>>>>> Sorry that I couldn't tell this from the code, but, what does the
>>>>> kernel return when the ptracer requests these registers and the
>>>>> program is not in a transaction?  Specifically I'm wondering whether
>>>>> this follows the same semantics as the s390 port.
>>>>>
>>>>
>>>> Right now, it still returns the saved state of the registers from thread
>>>> struct. I had assumed that the user must know the state of the transaction
>>>> before initiating the ptrace request. I guess its better to check for
>>>> the transaction status before processing the request. In case if TM is not
>>>> active on that thread, we should return -EINVAL.
>>>
>>> I think s390 returns ENODATA in that case.
>>>
>>>  https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html
>>>
>>> We'll want some way to tell whether the system actually
>>> supports this.  That could be ENODATA vs something-else (EINVAL
>>> or perhaps better EIO for "request is invalid").
>>
>> As Mickey has pointed out, the transaction memory support in the system can 
>> be
>> checked from the HWCAP2 flags. So when the transaction is not active, we will
>> return ENODATA instead for TM related ptrace regset requests.
> 
> Returning ENODATA when the transaction is not active, like
> s390 is great.  Thank you.
> 
> But I think it's worth it to consider what should the kernel
> return when the machine doesn't have these registers at all.
> 
> Sure, for this case we happen to have the hwcap flag.  But in
> general, I don't know whether we will always have a hwcap bit
> for each register set that is added.  Maybe we will, so that
> the info ends up in core dumps.
> 
> Still, I think it's worth to consider this case in the
> general sense, irrespective of hwcap.
> 
> That is, what should PTRACE_GETREGSET/PTRACE_SETREGSET return
> when the machine doesn't have the registers at all.  We shouldn't
> need to consult something elsewhere (like hwcap) to determine
> what ENODATA means.  The kernel knows it right there.  I think
> s390 goofed here.
> 
> Taking a look at x86, for example, we see:
> 
>   [REGSET_XSTATE] = {
>   .core_note_type = NT_X86_XSTATE,
>   .size = sizeof(u64), .align = sizeof(u64),
>   .active = xstateregs_active, .get = xstateregs_get,
>   .set = xstateregs_set
>   },
> 
> Note that it installs the ".active" hook.
> 
>  24 /**
>  25  * user_regset_active_fn - type of @active function in &struct user_regset
>  26  * @target: thread being examined
>  27  * @regset: regset being examined
>  28  *
>  29  * Return -%ENODEV if not available on the hardware found.
>  30  * Return %0 if no interesting state in this thread.
>  31  * Return >%0 number of @size units of interesting state.
>  32  * Any get call fetching state beyond that number will
>  33  * see the default initialization state for this data,
>  34  * so a caller that knows what the default state is need
>  35  * not copy it all out.
>  36  * This call is optional; the pointer is %NULL if there
>  37  * is no inexpensive check to yield a value < @n.
>  38  */
>  39 typedef int user_regset_active_fn(struct task_struct *target,
>  40   const struct user_regset *regset);
>  41
> 
> Note the mention of ENODEV.
> 
> I couldn't actually find any arch that currently returns -ENODEV in
> the "active" hook.  I see that binfmt_elf.c doesn't handle
> regset->activ

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-20 Thread Anshuman Khandual

On 05/19/2014 08:13 PM, Pedro Alves wrote:
> On 05/19/2014 12:46 PM, Anshuman Khandual wrote:
> 
>>>> I couldn't actually find any arch that currently returns -ENODEV in
>>>> the "active" hook.  I see that binfmt_elf.c doesn't handle
>>>> regset->active() returning < 0.  Guess that may be why.  Looks like
>>>> something that could be cleaned up, to me.
>>>>
>> Also it does not consider the return value of regset->active(t->task, regset)
>> (whose objective is to figure out whether we need to request regset->n number
>> of elements or less than that) in the subsequent call to regset->get 
>> function.
> 
> Indeed.
> 
> TBC, do you plan on fixing this?  Otherwise ...

Sure, thinking something like this as mentioned below. But still not sure how 
to use
the return type of -ENODEV from the function regset->active(). Right now if any
regset does have the active hook and it returns anything but positive value, it 
will
be ignored and the control moves to the next regset in view. This prevents the 
thread
core note type being written to the core dump.

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index aa3cb62..80672fb 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1553,7 +1553,15 @@ static int fill_thread_core_info(struct 
elf_thread_core_info *t,
if (regset->core_note_type && regset->get &&
(!regset->active || regset->active(t->task, regset))) {
int ret;
-   size_t size = regset->n * regset->size;
+   size_t size;
+
+   /* Request only the active elements in the regset */
+   if (!regset->active)
+   size = regset->n * regset->size;
+   else
+   size = regset->active(t->task, regset)
+   * regset->size;
+
void *data = kmalloc(size, GFP_KERNEL);
if (unlikely(!data))
return 0;

> 
>> Now coming to the installation of the .active hooks part for all the new 
>> regsets, it
>> should be pretty straight forward as well. Though its optional and used for 
>> elf_core_dump
>> purpose only, its worth adding them here. Example of an active function 
>> should be something
>> like this. The function is inexpensive as required.
>>
>> +static int tm_spr_active(struct task_struct *target,
>> +   const struct user_regset *regset)
>> +{
>> +   if (!cpu_has_feature(CPU_FTR_TM))
>> +   return -ENODEV;
> 
> ... unfortunately this will do the wrong thing.

I am not sure whether I understand this correctly. Are you saying that its 
wrong to return
-ENODEV in this case as above ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET, documentation in uapi header

2014-05-20 Thread Anshuman Khandual

On 05/14/2014 04:24 PM, Pedro Alves wrote:
> On 05/14/14 08:10, Anshuman Khandual wrote:
>> On 05/13/2014 11:39 PM, Pedro Alves wrote:
>>> On 05/05/14 05:10, Anshuman Khandual wrote:
>>>> On 05/01/2014 07:43 PM, Pedro Alves wrote:
>>> OK, then this is what I suggest instead:
> ...
>>>> Shall I resend the patch with the your proposed changes and your 
>>>> "Signed-off-by" and
>>>> moving myself as "Reported-by" ?
>>>
>>> No idea of the actual policy to follow.  Feel free to do that if that's the
>>> standard procedure.
>>
>> Even I am not sure about this, so to preserve the correct authorship, would 
>> you
>> mind sending this patch ?
> 
> Here you go.  This is against current Linus'.  Please take it from
> here if necessary.

Thanks Pedro for the patch. I would assume that the ptrace maintainer (Roland 
or Oleg as
mentioned in the MAINTAINERS file) will pick it from here and merge mainline. 
Please do
let me know if the process is otherwise different. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Add new ptrace request macros on PowerPC

2014-04-30 Thread Anshuman Khandual

On 04/30/2014 05:59 AM, Michael Neuling wrote:
> Anshuman Khandual  wrote:
> 
>> On 04/29/2014 01:52 PM, Michael Neuling wrote:
>>> That's not what that patch does. It shouldn't make any user visible changes
>>> to DSCR or PPR.
>>
>> It may not when it runs uninterrupted but after the tracee process has
>> stopped, thread.dscr reflects the default DSCR value as mentioned
>> before. This can be proved by changing the "dscr_default" value in
>> arch/powerpc/sysfs.c file.
> 
> The intention with DSCR is that if the user changes the DSCR, the kernel
> should always save/restore it.  If you are seeing something else, then
> that is a bug.  Anton has a test case for this here:
> 
>   http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
> 
> If that is failing, then there is a bug that we need to fix.
> 

Anton's above DSCR test passed.

> The PPR is the same, except that the kernel can change it over a
> syscall.
> 
>>> Over syscall PPR and DSCR may change.
> 
> Sorry, this should be only PPR.  DSCR shouldn't change over a syscall,
> at least that's the intention.
> 
>>> Depending on your test case, that may
>>> be your problem.
>>
>> I would guess when the tracee process stops for ptrace analysis, tm_reclaim 
>> or
>> tm_recheckpoint path might be crossed which is causing this dscr_default 
>> value
>> to go into thread_struct.
> 
> That shouldn't happen.  If that's happening, it's a bug.

I would believe this is happening. Also after reverting the commit
e9bdc3d6143d1c4b8d8ce5231, thread.dscr reflects the same value as that
of thread.tm_dscr which is the check pointed DSCR register value just
before the transaction started. So even the NIP has moved passed the point
where the user changes DSCR inside the transaction, thread.dscr is unable
to capture that latest value. But thread.dscr must contain the latest user
changed value of DSCR which is definitely not happening here. So there is
a problem we need to fix. 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-13 Thread Anshuman Khandual

On 05/13/2014 10:43 PM, Pedro Alves wrote:
> On 05/05/14 08:54, Anshuman Khandual wrote:
>> This patch enables get and set of transactional memory related register
>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
>> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
>> ELF core note types added previously in this regard.
>>
>>  (1) NT_PPC_TM_SPR
>>  (2) NT_PPC_TM_CGPR
>>  (3) NT_PPC_TM_CFPR
>>  (4) NT_PPC_TM_CVMX
> 
> Sorry that I couldn't tell this from the code, but, what does the
> kernel return when the ptracer requests these registers and the
> program is not in a transaction?  Specifically I'm wondering whether
> this follows the same semantics as the s390 port.
> 

Right now, it still returns the saved state of the registers from thread
struct. I had assumed that the user must know the state of the transaction
before initiating the ptrace request. I guess its better to check for
the transaction status before processing the request. In case if TM is not
active on that thread, we should return -EINVAL.

I am not familiar with the s390 side of code. But if we look at the
s390_tdb_get function it checks for (regs->int_code & 0x200) before
processing the request. Not sure what 0x200 signifies though.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-13 Thread Anshuman Khandual

On 05/13/2014 10:51 PM, Pedro Alves wrote:
> I wonder whether people are getting Roland's address from?
> 
> It's frequent that ptrace related patches end up CCed to
> rol...@redhat.com, but, he's not been at Red Hat for a few years
> now.  Roland, do you still want to be CCed on ptrace-related
> issues?  If so, there's probably a script somewhere in the
> kernel that needs updating.  If not, well, it'd be good
> if it were updated anyway.  :-)
> 
> It's a little annoying, as Red Hat's servers outright reject
> email sent from a @redhat.com address if one tries to send
> an email that includes a CC/FROM to a user that no longer
> exists in the @redhat.com domain.

Got the email address from some of the previous ptrace related
commits.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ptrace: Fix PTRACE_GETREGSET/PTRACE_SETREGSET in code documentation

2014-05-14 Thread Anshuman Khandual

On 05/13/2014 11:39 PM, Pedro Alves wrote:
> On 05/05/14 05:10, Anshuman Khandual wrote:
>> On 05/01/2014 07:43 PM, Pedro Alves wrote:
>>> On 04/28/2014 12:00 PM, Anshuman Khandual wrote:
>>>> The current documentation is bit misleading and does not explicitly
>>>> specify that iov.len need to be initialized failing which kernel
>>>> may just ignore the ptrace request and never read from/write into
>>>> the user specified buffer. This patch fixes the documentation.
>>>
>>> Well, it kind of does, here:
>>>
>>> *  struct iovec iov = { buf, len};
>>
>> :) Thats not explicit enough.
>>
>>>
>>>> @@ -43,8 +43,12 @@
>>>>   *
>>>>   *ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, 
>>>> NT_XXX_TYPE, &iov);
>>>>   *
>>>> - * On the successful completion, iov.len will be updated by the kernel,
>>>> - * specifying how much the kernel has written/read to/from the user's 
>>>> iov.buf.
>>>> + * A non-zero value upto the max size of data expected to be written/read 
>>>> by the
>>>> + * kernel in response to any NT_XXX_TYPE request type must be assigned to 
>>>> iov.len
>>>> + * before initiating the ptrace call. If iov.len is 0, then kernel will 
>>>> neither
>>>> + * read from or write into the user buffer specified. On successful 
>>>> completion,
>>>> + * iov.len will be updated by the kernel, specifying how much the kernel 
>>>> has
>>>> + * written/read to/from the user's iov.buf.
>>>
>>> I really appreciate that you're trying to make this clearer, but I
>>> find the new sentence very hard to read/reason.  :-/
>>>
>>> I suggest:
>>>
>>>  * This interface usage is as follows:
>>> - *  struct iovec iov = { buf, len};
>>> + *  struct iovec iov = { buf, len };
>>>  *
>>>  *  ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, 
>>> &iov);
>>>  *
>>> - * On the successful completion, iov.len will be updated by the kernel,
>>> - * specifying how much the kernel has written/read to/from the user's 
>>> iov.buf.
>>> + * On entry, iov describes the buffer's address and length.  The buffer's
>>> + * length must be equal to or shorter than the size of the NT_XXX_TYPE 
>>> regset.
>>> + * On successful completion, iov.len is updated by the kernel, specifying 
>>> how
>>> + * much the kernel has written/read to/from the user's iov.buf.
>>>
>>
>> Yeah, sounds better. I may add "If the length is zero, the kernel will 
>> neither read
>> from or write into the buffer"
> 
> Well, I think that much should be obvious.  What's not obvious is
> whether that is considered success or error (what is the return code?)
> I suspect and expect success return if the regset type is known, and
> error otherwise.  So that could be used as a way to probe for support
> for a given regset without using stack or heap space, if it ever matters.
> The kernel never reads/writes beyond iov.len, so better say that, and
> then it automatically gets the 0 case handled too, right?
> 
>>> I'm not sure I understood what you're saying correctly, though.  
>>> Specifically,
>>> I don't know whether the buffer's length must really be shorter than the
>>> size of the NT_XXX_TYPE regset.
>>
>> No, it does not have to. From the code snippet below (ptrace_regset function)
>> the buffer length has to be multiple of regset->size for the given 
>> NT_XXX_TYPE
>> upto the max regset size for the user to see any valid data.
> 
> Ah, I guess one could call it a bug.  If the passed in
> len is bigger than the whole register set size, then there seems
> to be no point in validating whether the length is multiple of
> a single register's size.  That unnecessarily prevents coming up
> with a register set in the future that has registers of
> different sizes...
> 
> But given that that's how things are today, I suppose we should
> document it...
> 
>  The problem what I
>> faced was when you use any iovec structure with the length parameter 
>> uninitialized,
>> the kernel simply ignores and does not return anything.
> 
> Ah.  Well, saying "does not return anything" is quite confusing.  It does
> return something -- -EINVAL.
> 
>>
>> if (!regset || (kiov->iov_len % regset->si

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-15 Thread Anshuman Khandual

On 05/14/2014 04:45 PM, Pedro Alves wrote:
> On 05/14/14 06:46, Anshuman Khandual wrote:
>> On 05/13/2014 10:43 PM, Pedro Alves wrote:
>>> On 05/05/14 08:54, Anshuman Khandual wrote:
>>>> This patch enables get and set of transactional memory related register
>>>> sets through PTRACE_GETREGSET/PTRACE_SETREGSET interface by implementing
>>>> four new powerpc specific register sets i.e REGSET_TM_SPR, REGSET_TM_CGPR,
>>>> REGSET_TM_CFPR, REGSET_CVMX support corresponding to these following new
>>>> ELF core note types added previously in this regard.
>>>>
>>>>(1) NT_PPC_TM_SPR
>>>>(2) NT_PPC_TM_CGPR
>>>>(3) NT_PPC_TM_CFPR
>>>>(4) NT_PPC_TM_CVMX
>>>
>>> Sorry that I couldn't tell this from the code, but, what does the
>>> kernel return when the ptracer requests these registers and the
>>> program is not in a transaction?  Specifically I'm wondering whether
>>> this follows the same semantics as the s390 port.
>>>
>>
>> Right now, it still returns the saved state of the registers from thread
>> struct. I had assumed that the user must know the state of the transaction
>> before initiating the ptrace request. I guess its better to check for
>> the transaction status before processing the request. In case if TM is not
>> active on that thread, we should return -EINVAL.
> 
> I think s390 returns ENODATA in that case.
> 
>  https://sourceware.org/ml/gdb-patches/2013-06/msg00273.html
> 
> We'll want some way to tell whether the system actually
> supports this.  That could be ENODATA vs something-else (EINVAL
> or perhaps better EIO for "request is invalid").

As Mickey has pointed out, the transaction memory support in the system can be
checked from the HWCAP2 flags. So when the transaction is not active, we will
return ENODATA instead for TM related ptrace regset requests.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V5 0/4] perf: New conditional branch filter

2014-03-25 Thread Anshuman Khandual

On 03/07/2014 02:36 PM, Anshuman Khandual wrote:
> Hello Arnaldo,
> 
>   I had posted the V5 version of PowerPC SW branch filter enablement
> patchset last month. Please find the patchset here at
> 
>   https://lkml.org/lkml/2014/2/5/79
> 
> These following patches (2,4,5,6 patches from the original V5 version 
> patchset)
> are the ones which change code in the generic kernel, perf tool and X86 perf.
> Basically this patchset adds one more branch filter for "conditional" 
> branches.
> In X86 code, this new filter has been implemented with the help of availble SW
> filter X86_BR_JCC and LBR_JCC. We had some discussions in this regard before.
> Please review these changes and if it's okay, please merge them. Other patches
> in the series are powerpc specific and are being reviewed by Benjamin 
> Herrenschmidt
> and Michael Ellerman. Let me know if you need more information.
> 
> [1] https://lkml.org/lkml/2013/5/22/51
> [2] https://lkml.org/lkml/2013/8/30/10
> [3] https://lkml.org/lkml/2013/10/16/75
> [4] https://lkml.org/lkml/2013/12/4/168
> [5] https://lkml.org/lkml/2014/2/5/79

Hey Arnaldo,

Do you have any comments or suggestions on this ? Have not received any
response on these proposed patch series yet. Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes

2014-07-17 Thread Anshuman Khandual

On 06/12/2014 02:39 PM, Anshuman Khandual wrote:
> On 05/23/2014 08:45 PM, Anshuman Khandual wrote:
>>  This patch series adds five new ELF core note sections which can be
>> used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing
>> various transactional memory and miscellaneous register sets on PowerPC
>> platform. Please find a test program exploiting these new ELF core note
>> types on a POWER8 system.
>>
>> RFC: https://lkml.org/lkml/2014/4/1/292
>> V1:  https://lkml.org/lkml/2014/4/2/43
>> V2:  https://lkml.org/lkml/2014/5/5/88
>>
>> Changes in V3
>> =
>> (1) Added two new error paths in every TM related get/set functions when 
>> regset
>> support is not present on the system (ENODEV) or when the process does 
>> not
>> have any transaction active (ENODATA) in the context
>>
>> (2) Installed the active hooks for all the newly added regset core note types
>>
>> Changes in V2
>> =
>> (1) Removed all the power specific ptrace requests corresponding to new 
>> NT_PPC_*
>> elf core note types. Now all the register sets can be accessed from 
>> ptrace
>> through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* 
>> core
>> note type instead
>> (2) Fixed couple of attribute values for REGSET_TM_CGPR register set
>> (3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread
>> (4) Fixed 32 bit checkpointed GPR support
>> (5) Changed commit messages accordingly
>>
>> Outstanding Issues
>> ==
>> (1) Running DSCR register value inside a transaction does not seem to be 
>> saved
>> at thread.dscr when the process stops for ptrace examination.
> 
> Hey Ben,
> 
> Any updates on this patch series ?

Ben,

Any updates on this patch series ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] powerpc/powernv: include asm/smp.h to handle UP config

2014-06-05 Thread Anshuman Khandual

On 06/05/2014 08:51 PM, Shreyas B. Prabhu wrote:
> Build throws following errors when CONFIG_SMP=n
> arch/powerpc/platforms/powernv/setup.c: In function 
> ‘pnv_kexec_wait_secondaries_down’:
> arch/powerpc/platforms/powernv/setup.c:179:4: error: implicit declaration of 
> function ‘get_hard_smp_processor_id’
> rc = opal_query_cpu_status(get_hard_smp_processor_id(i),
> 
> The usage of get_hard_smp_processor_id() needs the declaration from
> . The file setup.c includes , which in-turn
> includes . However,  includes 
> only on SMP configs and hence UP builds fail.
> 
> Fix this by directly including  in setup.c unconditionally.

Can you please clean up the description in the commit message ? and also
the first line in the commit message should mention that the patch is
trying to fix a UP specific build failure.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] powerpc/powernv : Disable subcore for UP configs

2014-06-05 Thread Anshuman Khandual

On 06/05/2014 08:54 PM, Shreyas B. Prabhu wrote:
> Build throws following errors when CONFIG_SMP=n
> arch/powerpc/platforms/powernv/subcore.c: In function ‘cpu_update_split_mode’:
> arch/powerpc/platforms/powernv/subcore.c:274:15: error: ‘setup_max_cpus’ 
> undeclared (first use in this function)
> arch/powerpc/platforms/powernv/subcore.c:285:5: error: lvalue required as 
> left operand of assignment
> 
> 'setup_max_cpus' variable is relevant only on SMP, so there is no point
> working around it for UP. Furthermore, subcore.c itself is relevant only
> on SMP and hence the better solution is to exclude subcore.c for UP builds.
> 
> Signed-off-by: Shreyas B. Prabhu 
> ---
> This patch applies on top of ben/powerpc.git/next branch
> 
>  arch/powerpc/platforms/powernv/Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index 4ad0d34..636d206 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -1,9 +1,9 @@
>  obj-y+= setup.o opal-takeover.o opal-wrappers.o 
> opal.o opal-async.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
> -obj-y+= opal-msglog.o subcore.o subcore-asm.o
> +obj-y+= opal-msglog.o subcore-asm.o
> 

subcore-asm.o can also move down here as well ?

> -obj-$(CONFIG_SMP)+= smp.o
> +obj-$(CONFIG_SMP)+= smp.o subcore.o

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET, documentation in uapi header

2014-06-12 Thread Anshuman Khandual

On 05/14/2014 04:24 PM, Pedro Alves wrote:
> On 05/14/14 08:10, Anshuman Khandual wrote:
>> On 05/13/2014 11:39 PM, Pedro Alves wrote:
>>> On 05/05/14 05:10, Anshuman Khandual wrote:
>>>> On 05/01/2014 07:43 PM, Pedro Alves wrote:
>>> OK, then this is what I suggest instead:
> ...
>>>> Shall I resend the patch with the your proposed changes and your 
>>>> "Signed-off-by" and
>>>> moving myself as "Reported-by" ?
>>>
>>> No idea of the actual policy to follow.  Feel free to do that if that's the
>>> standard procedure.
>>
>> Even I am not sure about this, so to preserve the correct authorship, would 
>> you
>> mind sending this patch ?
> 
> Here you go.  This is against current Linus'.  Please take it from
> here if necessary.
> 
> 8<--
> From 1237f5ac5896f3910f66df83a5093bb548006188 Mon Sep 17 00:00:00 2001
> From: Pedro Alves 
> Date: Wed, 14 May 2014 11:05:07 +0100
> Subject: [PATCH] ptrace: Clarify PTRACE_GETREGSET/PTRACE_SETREGSET
>  documentation in uapi header
> 
> The current comments don't explicitly state in plain words that
> iov.len must be set to the buffer's length prior to the ptrace call.
> A user might get confused and leave that uninitialized.
> 
> In the ptrace_regset function (snippet below) we see that the buffer
> length has to be a multiple of the slot/register size for the given
> NT_XXX_TYPE:
> 
> if (!regset || (kiov->iov_len % regset->size) != 0)
> return -EINVAL;
> 
> Note regset->size is the size of each slot/register in the set, not
> the size of the whole set.
> 
> And then, we see here:
> 
>  kiov->iov_len = min(kiov->iov_len,
> (__kernel_size_t) (regset->n * regset->size));
> 
> that the kernel takes care of capping the requested length to the size
> of the whole regset.
> 
> Signed-off-by: Pedro Alves 
> Reported-by: Anshuman Khandual 
> ---
>  include/uapi/linux/ptrace.h | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> index cf1019e..30836b9 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -39,12 +39,17 @@
>   * payload are exactly the same layout.
>   *
>   * This interface usage is as follows:
> - *   struct iovec iov = { buf, len};
> + *   struct iovec iov = { buf, len };
>   *
>   *   ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);
>   *
> - * On the successful completion, iov.len will be updated by the kernel,
> - * specifying how much the kernel has written/read to/from the user's 
> iov.buf.
> + * On entry, iov describes the buffer's address and length.  The buffer's 
> length
> + * must be a multiple of the size of a single register in the register set.  
> The
> + * kernel never reads or writes more than iov.len, and caps the buffer 
> length to
> + * the register set's size.  In other words, the kernel reads or writes
> + * min(iov.len, regset size).  On successful completion, iov.len is updated 
> by
> + * the kernel, specifying how much the kernel has read from / written to the
> + * user's iov.buf.
>   */
>  #define PTRACE_GETREGSET 0x4204
>  #define PTRACE_SETREGSET 0x4205

Hey Peter/Oleg,

The above patch is a documentation fix which we discussed sometime back. Could 
you please
kindly review and consider merging. Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 0/3] Add new PowerPC specific ELF core notes

2014-06-12 Thread Anshuman Khandual

On 05/23/2014 08:45 PM, Anshuman Khandual wrote:
>   This patch series adds five new ELF core note sections which can be
> used with existing ptrace request PTRACE_GETREGSET/SETREGSET for accessing
> various transactional memory and miscellaneous register sets on PowerPC
> platform. Please find a test program exploiting these new ELF core note
> types on a POWER8 system.
> 
> RFC: https://lkml.org/lkml/2014/4/1/292
> V1:  https://lkml.org/lkml/2014/4/2/43
> V2:  https://lkml.org/lkml/2014/5/5/88
> 
> Changes in V3
> =
> (1) Added two new error paths in every TM related get/set functions when 
> regset
> support is not present on the system (ENODEV) or when the process does not
> have any transaction active (ENODATA) in the context
> 
> (2) Installed the active hooks for all the newly added regset core note types
> 
> Changes in V2
> =
> (1) Removed all the power specific ptrace requests corresponding to new 
> NT_PPC_*
> elf core note types. Now all the register sets can be accessed from ptrace
> through PTRACE_GETREGSET/PTRACE_SETREGSET using the individual NT_PPC* 
> core
> note type instead
> (2) Fixed couple of attribute values for REGSET_TM_CGPR register set
> (3) Renamed flush_tmreg_to_thread as flush_tmregs_to_thread
> (4) Fixed 32 bit checkpointed GPR support
> (5) Changed commit messages accordingly
> 
> Outstanding Issues
> ==
> (1) Running DSCR register value inside a transaction does not seem to be saved
> at thread.dscr when the process stops for ptrace examination.

Hey Ben,

Any updates on this patch series ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] powerpc, ptrace: Add few more ptrace request macros

2014-04-01 Thread Anshuman Khandual

This patch adds few more ptrace request macros expanding
the existing capability. These ptrace requests macros can
be classified into two categories.

(1) Transactional memory

/* TM special purpose registers */
PTRACE_GETTM_SPRREGS
PTRACE_SETTM_SPRREGS

/* Checkpointed GPR registers */
PTRACE_GETTM_CGPRREGS
PTRACE_SETTM_CGPRREGS

/* Checkpointed FPR registers */
PTRACE_GETTM_CFPRREGS
PTRACE_SETTM_CFPRREGS

/* Checkpointed VMX registers */
PTRACE_GETTM_CVMXREGS
PTRACE_SETTM_CVMXREGS

(2) Miscellaneous

/* TAR, PPR, DSCR registers */
PTRACE_GETMSCREGS
PTRACE_SETMSCREGS

This patch also adds mutliple new generic ELF core note sections in
this regard which can be listed as follows.

NT_PPC_TM_SPR   /* Transactional memory specific registers */
NT_PPC_TM_CGPR  /* Transactional memory checkpointed GPR */
NT_PPC_TM_CFPR  /* Transactional memory checkpointed FPR */
NT_PPC_TM_CVMX  /* Transactional memory checkpointed VMX */
NT_PPC_MISC /* Miscellaneous registers */

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/switch_to.h   |   8 +
 arch/powerpc/include/uapi/asm/ptrace.h |  61 +++
 arch/powerpc/kernel/process.c  |  24 ++
 arch/powerpc/kernel/ptrace.c   | 658 +++--
 include/uapi/linux/elf.h   |   5 +
 5 files changed, 729 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 0e83e7d..73e2601 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -80,6 +80,14 @@ static inline void flush_spe_to_thread(struct task_struct *t)
 }
 #endif
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+extern void flush_tmreg_to_thread(struct task_struct *);
+#else
+static inline void flush_tmreg_to_thread(struct task_struct *t)
+{
+}
+#endif
+
 static inline void clear_task_ebb(struct task_struct *t)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index 77d2ed3..fd962d6 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -190,6 +190,67 @@ struct pt_regs {
 #define PPC_PTRACE_SETHWDEBUG  0x88
 #define PPC_PTRACE_DELHWDEBUG  0x87
 
+/* Transactional memory registers */
+
+/*
+ * SPR
+ *
+ * struct data {
+ * u64 tm_tfhar;
+ * u64 tm_texasr;
+ * u64 tm_tfiar;
+ * unsigned long   tm_orig_msr;
+ * u64 tm_tar;
+ * u64 tm_ppr;
+ * u64 tm_dscr;
+ * };
+ */
+#define PTRACE_GETTM_SPRREGS   0x70
+#define PTRACE_SETTM_SPRREGS   0x71
+
+/* 
+ * Checkpointed GPR
+ *
+ * struct data {
+ * struct pt_regs  ckpt_regs;
+ * };
+ */
+#define PTRACE_GETTM_CGPRREGS  0x72
+#define PTRACE_SETTM_CGPRREGS  0x73
+
+/*
+ * Checkpointed FPR
+ *
+ * struct data {
+ * u64 fpr[32];
+ * u64 fpscr;
+ * };
+ */
+#define PTRACE_GETTM_CFPRREGS  0x74
+#define PTRACE_SETTM_CFPRREGS  0x75
+
+/* 
+ * Checkpointed VMX
+ *
+ * struct data {
+ * vector128   vr[32];
+ * vector128   vscr;
+ * unsigned long   vrsave; 
+ *};
+ */
+#define PTRACE_GETTM_CVMXREGS  0x76
+#define PTRACE_SETTM_CVMXREGS  0x77
+
+/* Miscellaneous registers */
+#define PTRACE_GETMSCREGS  0x78
+#define PTRACE_SETMSCREGS  0x79
+
+/*
+ * XXX: A note to application developers. The existing data layout
+ * of the above four ptrace requests can change when new registers
+ * are available for each category in forthcoming processors.
+ */
+
 #ifndef __ASSEMBLY__
 
 struct ppc_debug_info {
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index af064d2..e5dfd8e 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -673,6 +673,30 @@ static inline void __switch_to_tm(struct task_struct *prev)
}
 }
 
+void flush_tmreg_to_thread(struct task_struct *tsk)
+{
+   /*
+* If task is not current, it should have been flushed
+* already to it's thread_struct during __switch_to().
+*/
+   if (tsk != current)
+   return;
+
+   preempt_disable();
+   if (tsk->thread.regs) {
+   /* 
+* If we are still current, the TM state need to
+* be flushed to thread_struct as it will be still
+* present in the current cpu
+*/
+   if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
+   __switch_to_tm(tsk);
+   tm_recheckpoint_new_task(tsk);
+   }
+   }
+   preempt_enable();
+}
+
 /*
  * This is called if we are on the way out to userspace and the
  * TIF_RESTORE_TM flag is set.  It checks if we need to reload
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/

Fwd: [V6 00/11] perf: New conditional branch filter

2014-05-21 Thread Anshuman Khandual

Hello Peter/Ingo,

Would you please consider reviewing the first four patches in this patch series
which changes the generic perf kernel and perf tools code. Andi Kleen and 
Stephane
Eranian have already reviewed these changes. The rest of the patch series is 
related
to powerpc and being reviewed by Michael Ellerman/Ben.

Regards
Anshuman 

 Original Message 
Subject: [V6 00/11] perf: New conditional branch filter
Date: Mon,  5 May 2014 14:39:02 +0530
From: Anshuman Khandual 
To: linuxppc-...@ozlabs.org, linux-kernel@vger.kernel.org
CC: mi...@neuling.org, a...@linux.intel.com, eran...@google.com, 
mich...@ellerman.id.au, a...@ghostprotocols.net, suka...@linux.vnet.ibm.com, 
mi...@kernel.org

This patchset is the re-spin of the original branch stack 
sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This 
patchset
also enables SW based branch filtering support for book3s powerpc platforms 
which
have PMU HW backed branch stack sampling support. 

Summary of code changes in this patchset:

(1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
(2) Add the "cond" branch filter options in the "perf record" tool
(3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
(4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
(5) Update the documentation regarding "perf record" tool
(6) Add some new powerpc instruction analysis functions in code-patching library
(7) Enable SW based branch filter support for powerpc book3s
(8) Changed BHRB configuration in POWER8 to accommodate SW branch filters 

With this new SW enablement, the branch filter support for book3s platforms have
been extended to include all these combinations discussed below with a sample 
test
application program (included here).

Changes in V2
=
(1) Enabled PPC64 SW branch filtering support
(2) Incorporated changes required for all previous comments

Changes in V3
=
(1) Split the SW branch filter enablement into multiple patches
(2) Added PMU neutral SW branch filtering code, PMU specific HW branch 
filtering code
(3) Added new instruction analysis functionality into powerpc code-patching 
library
(4) Changed name for some of the functions
(5) Fixed couple of spelling mistakes
(6) Changed code documentation in multiple places

Changes in V4
=
(1) Changed the commit message for patch (01/10)
(2) Changed the patch (02/10) to accommodate review comments from Michael 
Ellerman
(3) Rebased the patchset against latest Linus's tree

Changes in V5
=
(1) Added a precursor patch to cleanup the indentation problem in 
power_pmu_bhrb_read
(2) Added a precursor patch to re-arrange P8 PMU BHRB filter config which 
improved the clarity
(3) Merged the previous 10th patch into the 8th patch
(4) Moved SW based branch analysis code from core perf into code-patching 
library as suggested by Michael
(5) Simplified the logic in branch analysis library
(6) Fixed some ambiguities in documentation at various places
(7) Added some more in-code documentation blocks at various places
(8) Renamed some local variable and function names
(9) Fixed some indentation and white space errors in the code
(10) Implemented almost all the review comments and suggestions made by Michael 
Ellerman on V4 patchset
(11) Enabled privilege mode SW branch filter
(12) Simplified and generalized the SW implemented conditional branch filter
(13) PERF_SAMPLE_BRANCH_COND filter is now supported only through SW 
implementation
(14) Adjusted other patches to deal with the above changes

Changes in V6
=
(1) Rebased the patchset against the master
(2) Added "Reviewed-by: Andi Kleen" in the first four patches in the series 
which changes the
generic or X86 perf code. [https://lkml.org/lkml/2014/4/7/130]

HW implemented branch filters
=

(1) perf record -j any_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared ObjectSource Symbol  Target 
Shared Object Target Symbol
#   ...    ...  
  
#
 7.85%cprog  cprog [.] sw_3_1   cprog   
  [.] success_3_1_2   
 5.66%cprog  cprog [.] sw_3_1   cprog   
  [.] sw_3_1_2
 5.65%cprog  cprog [.] hw_1_1   cprog   
  [.] symbol1 
 5.42%cprog  cprog [.] sw_3_1   cprog   
  [.] sw_3_1_3
 5.40%cprog  cprog [.] callme   cprog   
  [.] hw_1_1  
 5.40%cprog  cprog [.] sw_3_1   cprog   
  [.] success_3_1_1   
 5.40%cprog  cprog [.] sw_3_1   cprog   
  [.] sw_3_1_1
 5.39%cprog  cprog

[V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND

2014-05-21 Thread Anshuman Khandual

This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Various architectures can provide
this functionality with either with HW filtering support (if present)
or with SW filtering of captured branch instructions.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 853bc1c..696f69b4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */
+   PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
 
-   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
+   PERF_SAMPLE_BRANCH_MAX  = 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V6 04/11] perf, documentation: Description for conditional branch filter

2014-05-21 Thread Anshuman Khandual

Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index c71b0f3..d460049 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -184,9 +184,10 @@ following filters are defined:
- in_tx: only when the target is in a hardware transaction
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
+   - cond: conditional branches
 
 +
-The option requires at least one branch type among any, any_call, any_ret, 
ind_call.
+The option requires at least one branch type among any, any_call, any_ret, 
ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of 
the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) 
privilege
 levels are subject to permissions.  When sampling on multiple events, branch 
stack sampling
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V6 06/11] powerpc, perf: Re-arrange PMU based branch filter processing in POWER8

2014-05-21 Thread Anshuman Khandual

This patch does some code re-arrangements to make it clear that
it ignores any separate privilege level branch filter request
and does not support any combinations of HW PMU branch filters.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/power8-pmu.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index fe2763b..13f47f5 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -635,8 +635,6 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 {
-   u64 pmu_bhrb_filter = 0;
-
/* BHRB and regular PMU events share the same privilege state
 * filter configuration. BHRB is always recorded along with a
 * regular PMU event. As the privilege state filter is handled
@@ -644,20 +642,15 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 * PMU event, we ignore any separate BHRB specific request.
 */
 
-   /* No branch filter requested */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
-   return pmu_bhrb_filter;
-
-   /* Invalid branch filter options - HW does not support */
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
-   return -1;
+   /* Ignore user, kernel, hv bits */
+   branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
 
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
-   return -1;
+   /* No branch filter requested */
+   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
+   return 0;
 
-   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
-   pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
-   return pmu_bhrb_filter;
+   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
+   return POWER8_MMCRA_IFM1;
}
 
/* Every thing else is unsupported */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V6 00/11] perf: New conditional branch filter

2014-05-21 Thread Anshuman Khandual

ymbol1
 2.47%cprog  cprog [.] sw_3_1_1cprog
 [.] sw_3_1 
 2.47%cprog  cprog [.] sw_3_1  cprog
 [.] sw_3_1_1   
 2.47%cprog  cprog [.] callme  cprog
 [.] hw_1_1 
 2.47%cprog  cprog [.] callme  cprog
 [.] sw_3_1 
 2.47%cprog  cprog [.] hw_1_2  cprog
 [.] symbol2
 2.47%cprog  cprog [.] hw_2_1  cprog
 [.] address1   
 2.47%cprog  cprog [.] back1   cprog
 [.] callme 
 2.47%cprog  cprog [.] sw_3_1_3cprog
 [.] sw_3_1 
 2.47%cprog  cprog [.] sw_3_1  cprog
 [.] sw_3_1_3   
 2.47%cprog  cprog [.] sw_3_1  cprog
 [.] callme 
 2.47%cprog  cprog [.] callme  cprog
 [.] hw_1_2 
 2.47%cprog  cprog [.] callme  cprog
 [.] sw_4_2 
 2.46%cprog  cprog [.] sw_3_1_2cprog
 [.] sw_3_1 
 2.46%cprog  cprog [.] sw_3_1  cprog
 [.] sw_3_1_2   
 1.57%cprog  cprog [.] success_3_1_2   cprog
 [.] sw_3_1 
 1.57%cprog  cprog [.] sw_3_1  cprog
 [.] success_3_1_2  
 1.57%cprog  cprog [.] hw_1_1  cprog
 [.] callme 
 1.56%cprog  cprog [.] hw_2_2  cprog
 [.] address2   
 1.56%cprog  cprog [.] back2   cprog
 [.] callme 
 1.56%cprog  cprog [.] sw_3_2  cprog
 [.] callme 
 1.56%cprog  cprog [.] callme  cprog
 [.] sw_3_2 
 1.41%cprog  cprog [.] success_3_1_1   cprog
 [.] sw_3_1 
 1.41%cprog  cprog [.] sw_3_1  cprog
 [.] success_3_1_1  
 1.40%cprog  cprog [.] sw_4_1  cprog
 [.] callme 
 1.39%cprog  cprog [.] hw_1_2  cprog
 [.] callme 
 1.39%cprog  cprog [.] sw_3_1  cprog
 [.] success_3_1_3  
 1.39%cprog  cprog [.] callme  cprog
 [.] main   
 0.14%cprog  [unknown] [.] 0xf7d72328  [unknown]
 [.] 0xf7d72320 
 0.03%cprog  [unknown] [k] cprog
 [k] callme 
 0.01%cprog  libc-2.11.2.so[.] _IO_doallocbuf  libc-2.11.2.so   
 [.] _IO_doallocbuf 
 0.01%cprog  libc-2.11.2.so[.] printf  cprog
 [.] main   
 0.01%cprog  libc-2.11.2.so[.] _IO_doallocbuf  libc-2.11.2.so   
 [.] _IO_file_doallocate
 0.01%cprog  ld-2.11.2.so  [.] malloc  [unknown]
 [.] 0xf7d8b380 
 0.01%cprog  cprog [.] main[unknown]
 [.] 0x0fe7f63c 
 0.01%cprog  [unknown] [.] 0xf7d8b388  ld-2.11.2.so 
 [.] __libc_memalign
 0.01%cprog  [unknown] [.] ld-2.11.2.so 
 [.] malloc 

Please refer to the V4 version of the patchset to learn about the sample test 
case and it's makefile.

Anshuman Khandual (11):
  perf: Add PERF_SAMPLE_BRANCH_COND
  perf, tool: Conditional branch filter 'cond' added to perf record
  x86, perf: Add conditional branch filtering support
  perf, documentation: Description for conditional branch filter
  powerpc, perf: Re-arrange BHRB processing
  powerpc, perf: Re-arrange PMU based branch filter processing in POWER8
  powerpc, perf: Change the name of HW PMU branch filter tracking variable
  powerpc, lib: Add new branch analysis support functions
  powerpc, perf: Enable SW filtering in branch stack sampling framework
  power8, perf: Adapt BHRB PMU configuration to work with SW filters
  powerpc, perf: Enable privilege mode SW branch filters

 arch/powerpc/include/asm/code-patching.h |  16 ++
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/lib/code-patching.c |  80 +++
 arch/powerpc/perf/core-book3s.c

[V6 11/11] powerpc, perf: Enable privilege mode SW branch filters

2014-05-21 Thread Anshuman Khandual

This patch enables privilege mode SW branch filters. Also modifies
POWER8 PMU branch filter configuration so that the privilege mode
branch filter implemented as part of base PMU event configuration
is reflected in bhrb filter mask. As a result, the SW will skip and
not try to process the privilege mode branch filters itself.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 53 +++--
 arch/powerpc/perf/power8-pmu.c  | 13 --
 2 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a94cc43..297cddb 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -26,6 +26,9 @@
 #define BHRB_PREDICTION0x0001
 #define BHRB_EA0xFFFCUL
 
+#define POWER_ADDR_USER0
+#define POWER_ADDR_KERNEL  1
+
 struct cpu_hw_events {
int n_events;
int n_percpu;
@@ -450,10 +453,10 @@ static bool check_instruction(unsigned int *addr, u64 
sw_filter)
  * Access the instruction contained in the address and check
  * whether it complies with the applicable SW branch filters.
  */
-static bool keep_branch(u64 from, u64 sw_filter)
+static bool keep_branch(u64 from, u64 to, u64 sw_filter)
 {
unsigned int instr;
-   bool ret;
+   bool to_plm, ret, flag;
 
/*
 * The "from" branch for every branch record has to go
@@ -463,6 +466,37 @@ static bool keep_branch(u64 from, u64 sw_filter)
if (sw_filter == 0)
return true;
 
+   to_plm = is_kernel_addr(to) ? POWER_ADDR_KERNEL : POWER_ADDR_USER;
+
+   /*
+* Applying privilege mode SW branch filters first on the
+* 'to' address makes an AND semantic with the SW generic
+* branch filters (OR with each other) being applied on the
+* from address there after.
+*/
+
+   /* Ignore PERF_SAMPLE_BRANCH_HV */
+   sw_filter &= ~PERF_SAMPLE_BRANCH_HV;
+
+   /* Privilege mode branch filters for "TO" address */
+   if (sw_filter & PERF_SAMPLE_BRANCH_PLM_ALL) {
+   flag = false;
+
+   if (sw_filter & PERF_SAMPLE_BRANCH_USER) {
+   if(to_plm == POWER_ADDR_USER)
+   flag = true;
+   }
+
+   if (sw_filter & PERF_SAMPLE_BRANCH_KERNEL) {
+   if(to_plm == POWER_ADDR_KERNEL)
+   flag = true;
+   }
+
+   if (!flag)
+   return false;
+   }
+
+   /* Generic branch filters for "FROM" address */
if (is_kernel_addr(from)) {
return check_instruction((unsigned int *) from, sw_filter);
} else {
@@ -501,15 +535,6 @@ static int all_filters_covered(u64 branch_sample_type, u64 
bhrb_filter)
if (!(branch_sample_type & x))
continue;
/*
-* Privilege filter requests have been already
-* taken care during the base PMU configuration.
-*/
-   if ((x == PERF_SAMPLE_BRANCH_USER)
-   || (x == PERF_SAMPLE_BRANCH_KERNEL)
-   || (x == PERF_SAMPLE_BRANCH_HV))
-   continue;
-
-   /*
 * Requested filter not available either
 * in PMU or in SW.
 */
@@ -520,7 +545,10 @@ static int all_filters_covered(u64 branch_sample_type, u64 
bhrb_filter)
 }
 
 /* SW implemented branch filters */
-static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_ANY_CALL,
+static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_USER,
+ PERF_SAMPLE_BRANCH_KERNEL,
+ PERF_SAMPLE_BRANCH_HV,
+ PERF_SAMPLE_BRANCH_ANY_CALL,
  PERF_SAMPLE_BRANCH_COND,
  PERF_SAMPLE_BRANCH_ANY_RETURN,
  PERF_SAMPLE_BRANCH_IND_CALL };
@@ -624,6 +652,7 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 
/* Apply SW branch filters and drop the entry if required */
if (!keep_branch(cpuhw->bhrb_entries[u_index].from,
+   cpuhw->bhrb_entries[u_index].to,
cpuhw->bhrb_sw_filter))
u_index--;
u_index++;
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 4743bde..b6e21da 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -649,9 +649,19 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, 
u64 *bhrb_filter)

[V6 05/11] powerpc, perf: Re-arrange BHRB processing

2014-05-21 Thread Anshuman Khandual

This patch cleans up some existing indentation problem and
re-organizes the BHRB processing code with an helper function
named `update_branch_entry` making it more readable. This patch
does not change any functionality.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 102 
 1 file changed, 52 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 4520c93..66bea54 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -402,11 +402,21 @@ static __u64 power_pmu_bhrb_to(u64 addr)
return target - (unsigned long)&instr + addr;
 }
 
+/* Update individual branch entry */
+void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, 
u64 to, int pred)
+{
+   cpuhw->bhrb_entries[u_index].from = from;
+   cpuhw->bhrb_entries[u_index].to = to;
+   cpuhw->bhrb_entries[u_index].mispred = pred;
+   cpuhw->bhrb_entries[u_index].predicted = ~pred;
+   return;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
u64 val;
-   u64 addr;
+   u64 addr, tmp;
int r_index, u_index, pred;
 
r_index = 0;
@@ -417,62 +427,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
if (!val)
/* Terminal marker: End of valid BHRB entries */
break;
-   else {
-   addr = val & BHRB_EA;
-   pred = val & BHRB_PREDICTION;
 
-   if (!addr)
-   /* invalid entry */
-   continue;
+   addr = val & BHRB_EA;
+   pred = val & BHRB_PREDICTION;
 
-   /* Branches are read most recent first (ie. mfbhrb 0 is
-* the most recent branch).
-* There are two types of valid entries:
-* 1) a target entry which is the to address of a
-*computed goto like a blr,bctr,btar.  The next
-*entry read from the bhrb will be branch
-*corresponding to this target (ie. the actual
-*blr/bctr/btar instruction).
-* 2) a from address which is an actual branch.  If a
-*target entry proceeds this, then this is the
-*matching branch for that target.  If this is not
-*following a target entry, then this is a branch
-*where the target is given as an immediate field
-*in the instruction (ie. an i or b form branch).
-*In this case we need to read the instruction from
-*memory to determine the target/to address.
+   if (!addr)
+   /* invalid entry */
+   continue;
+
+   /* Branches are read most recent first (ie. mfbhrb 0 is
+* the most recent branch).
+* There are two types of valid entries:
+* 1) a target entry which is the to address of a
+*computed goto like a blr,bctr,btar.  The next
+*entry read from the bhrb will be branch
+*corresponding to this target (ie. the actual
+*blr/bctr/btar instruction).
+* 2) a from address which is an actual branch.  If a
+*target entry proceeds this, then this is the
+*matching branch for that target.  If this is not
+*following a target entry, then this is a branch
+*where the target is given as an immediate field
+*in the instruction (ie. an i or b form branch).
+*In this case we need to read the instruction from
+*memory to determine the target/to address.
+*/
+   if (val & BHRB_TARGET) {
+   /* Target branches use two entries
+* (ie. computed gotos/XL form)
 */
+   tmp = addr;
 
+   /* Get from address in next entry */
+   val = read_bhrb(r_index++);
+   addr = val & BHRB_EA;
if (val & BHRB_TARGET) {
-   /* Target branches use two entries
-* (ie. computed gotos/XL form)
-*/
-   cpuhw->bhrb_entries[u_index].to = addr;
-   cpuhw->bhrb_entries[u_index].mispred = pred;
-

[V6 07/11] powerpc, perf: Change the name of HW PMU branch filter tracking variable

2014-05-21 Thread Anshuman Khandual

This patch simply changes the name of the variable from 'bhrb_filter' to
'bhrb_hw_filter' in order to add one more variable which will track SW
filters in generic powerpc book3s code which will be implemented in the
subsequent patch. This patch does not change any functionality.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/core-book3s.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 66bea54..1d7e909 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -47,7 +47,7 @@ struct cpu_hw_events {
int n_txn_start;
 
/* BHRB bits */
-   u64 bhrb_filter;/* BHRB HW branch 
filter */
+   u64 bhrb_hw_filter; /* BHRB HW branch 
filter */
int bhrb_users;
void*bhrb_context;
struct  perf_branch_stack   bhrb_stack;
@@ -1298,7 +1298,7 @@ static void power_pmu_enable(struct pmu *pmu)
 
mb();
if (cpuhw->bhrb_users)
-   ppmu->config_bhrb(cpuhw->bhrb_filter);
+   ppmu->config_bhrb(cpuhw->bhrb_hw_filter);
 
write_mmcr0(cpuhw, mmcr0);
 
@@ -1405,7 +1405,7 @@ nocheck:
  out:
if (has_branch_stack(event)) {
power_pmu_bhrb_enable(event);
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
}
 
@@ -1788,10 +1788,10 @@ static int power_pmu_event_init(struct perf_event 
*event)
err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
if (has_branch_stack(event)) {
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
 
-   if(cpuhw->bhrb_filter == -1)
+   if(cpuhw->bhrb_hw_filter == -1)
return -EOPNOTSUPP;
}
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V6 08/11] powerpc, lib: Add new branch analysis support functions

2014-05-21 Thread Anshuman Khandual

Generic powerpc branch analysis support added in the code patching
library which will help the subsequent patch on SW based filtering
of branch records in perf.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/code-patching.h | 16 +++
 arch/powerpc/lib/code-patching.c | 80 
 2 files changed, 96 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 97e02f9..39919d4 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -22,6 +22,16 @@
 #define BRANCH_SET_LINK0x1
 #define BRANCH_ABSOLUTE0x2
 
+#define XL_FORM_LR  0x4C20
+#define XL_FORM_CTR 0x4C000420
+#define XL_FORM_TAR 0x4C000460
+
+#define BO_ALWAYS0x0280
+#define BO_CTR   0x0200
+#define BO_CRBI_OFF  0x0080
+#define BO_CRBI_ON   0x0180
+#define BO_CRBI_HINT 0x0040
+
 unsigned int create_branch(const unsigned int *addr,
   unsigned long target, int flags);
 unsigned int create_cond_branch(const unsigned int *addr,
@@ -56,4 +66,10 @@ static inline unsigned long ppc_function_entry(void *func)
 #endif
 }
 
+/* Perf branch filters */
+bool instr_is_return_branch(unsigned int instr);
+bool instr_is_conditional_branch(unsigned int instr);
+bool instr_is_func_call(unsigned int instr);
+bool instr_is_indirect_func_call(unsigned int instr);
+
 #endif /* _ASM_POWERPC_CODE_PATCHING_H */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index d5edbeb..a06f8b3 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -77,6 +77,7 @@ static unsigned int branch_opcode(unsigned int instr)
return (instr >> 26) & 0x3F;
 }
 
+/* Forms of branch instruction */
 static int instr_is_branch_iform(unsigned int instr)
 {
return branch_opcode(instr) == 18;
@@ -87,6 +88,85 @@ static int instr_is_branch_bform(unsigned int instr)
return branch_opcode(instr) == 16;
 }
 
+static int instr_is_branch_xlform(unsigned int instr)
+{
+   return branch_opcode(instr) == 19;
+}
+
+/* Classification of XL-form instruction */
+static int is_xlform_lr(unsigned int instr)
+{
+   return (instr & XL_FORM_LR) == XL_FORM_LR;
+}
+
+/* BO field analysis (B-form or XL-form) */
+static int is_bo_always(unsigned int instr)
+{
+   return (instr & BO_ALWAYS) == BO_ALWAYS;
+}
+
+/* Link bit is set */
+static int is_branch_link_set(unsigned int instr)
+{
+   return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK;
+}
+
+/* 
+ * Generic software implemented branch filters used
+ * by perf branch stack sampling when PMU does not
+ * process them for some reason.
+ */
+
+/* PERF_SAMPLE_BRANCH_ANY_RETURN */
+bool instr_is_return_branch(unsigned int instr)
+{
+   /*
+* Conditional and unconditional branch to LR register
+* without seting the link register.
+*/
+   if (is_xlform_lr(instr) && !is_branch_link_set(instr))
+   return true;
+
+   return false;
+}
+
+/* PERF_SAMPLE_BRANCH_COND */
+bool instr_is_conditional_branch(unsigned int instr)
+{
+   /* I-form instruction - excluded */
+   if (instr_is_branch_iform(instr))
+   return false;
+
+   /* B-form or XL-form instruction */
+   if (instr_is_branch_bform(instr) || instr_is_branch_xlform(instr))  {
+
+   /* Not branch always */
+   if (!is_bo_always(instr))
+   return true;
+   }
+   return false;
+}
+
+/* PERF_SAMPLE_BRANCH_ANY_CALL */
+bool instr_is_func_call(unsigned int instr)
+{
+   /* LR should be set */
+   if (is_branch_link_set(instr))
+   return true;
+
+   return false;
+}
+
+/* PERF_SAMPLE_BRANCH_IND_CALL */
+bool instr_is_indirect_func_call(unsigned int instr)
+{
+   /* XL-form instruction with LR set */
+   if (instr_is_branch_xlform(instr) && is_branch_link_set(instr))
+   return true;
+
+   return false;
+}
+
 int instr_is_relative_branch(unsigned int instr)
 {
if (instr & BRANCH_ABSOLUTE)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V6 03/11] x86, perf: Add conditional branch filtering support

2014-05-21 Thread Anshuman Khandual

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..9dd2459 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event 
*event)
if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
mask |= X86_BR_NO_TX;
 
+   if (br_type & PERF_SAMPLE_BRANCH_COND)
+   mask |= X86_BR_JCC;
+
/*
 * stash actual user request into reg, it may
 * be used by fixup code for some CPU
@@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 */
[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+   [PERF_SAMPLE_BRANCH_COND] = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
[PERF_SAMPLE_BRANCH_ANY_CALL]   = LBR_REL_CALL | LBR_IND_CALL
| LBR_FAR,
[PERF_SAMPLE_BRANCH_IND_CALL]   = LBR_IND_CALL,
+   [PERF_SAMPLE_BRANCH_COND]   = LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V6 09/11] powerpc, perf: Enable SW filtering in branch stack sampling framework

2014-05-21 Thread Anshuman Khandual

This patch enables SW based post processing of BHRB captured branches
to be able to meet more user defined branch filtration criteria in perf
branch stack sampling framework. These changes increase the number of
branch filters and their valid combinations on any powerpc64 server
platform with BHRB support. Find the summary of code changes here.

(1) struct cpu_hw_events

Introduced two new variables track various filter values and mask

(a) bhrb_sw_filter  Tracks SW implemented branch filter flags
(b) bhrb_filter Tracks both (SW and HW) branch filter flags

(2) Event creation

Kernel will figure out supported BHRB branch filters through a PMU call
back 'bhrb_filter_map'. This function will find out how many of the
requested branch filters can be supported in the PMU HW. It will not
try to invalidate any branch filter combinations. Event creation will 
not
error out because of lack of HW based branch filters. Meanwhile it will
track the overall supported branch filters in the 'bhrb_filter' 
variable.

Once the PMU call back returns kernel will process the user branch 
filter
request against available SW filters (bhrb_sw_filter_map) while looking
at the 'bhrb_filter'. During this phase all the branch filters which are
still pending from the user requested list will have to be supported in
SW failing which the event creation will error out.

(3) SW branch filter

During the BHRB data capture inside the PMU interrupt context, each
of the captured 'perf_branch_entry.from' will be checked for compliance
with applicable SW branch filters. If the entry does not conform to the
filter requirements, it will be discarded from the final perf branch
stack buffer.

(4) Supported SW based branch filters

(a) PERF_SAMPLE_BRANCH_ANY_RETURN
(b) PERF_SAMPLE_BRANCH_IND_CALL
(c) PERF_SAMPLE_BRANCH_ANY_CALL
(d) PERF_SAMPLE_BRANCH_COND

Please refer the patch to understand the classification of instructions
into these branch filter categories.

(5) Multiple branch filter semantics

Book3 sever implementation follows the same OR semantics (as 
implemented in
x86) while dealing with multiple branch filters at any point of time. SW
branch filter analysis is carried on the data set captured in the PMU 
HW.
So the resulting set of data (after applying the SW filters) will 
inherently
be an AND with the HW captured set. Hence any combination of HW and SW 
branch
filters will be invalid. HW based branch filters are more efficient and 
faster
compared to SW implemented branch filters. So at first the PMU should 
decide
whether it can support all the requested branch filters itself or not. 
In case
it can support all the branch filters in an OR manner, we dont apply 
any SW
branch filter on top of the HW captured set (which is the final set). 
This
preserves the OR semantic of multiple branch filters as required. But 
in case
where the PMU cannot support all the requested branch filters in an OR 
manner,
it should not apply any it's filters and leave it upto the SW to handle 
them
all. Its the PMU code's responsibility to uphold this protocol to be 
able to
conform to the overall OR semantic of perf branch stack sampling 
framework.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/perf/core-book3s.c  | 188 ++-
 arch/powerpc/perf/power8-pmu.c   |   2 +-
 3 files changed, 187 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 9ed73714..93a9a8a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -19,6 +19,10 @@
 #define MAX_EVENT_ALTERNATIVES 8
 #define MAX_LIMITED_HWCOUNTERS 2
 
+#define for_each_branch_sample_type(x) \
+for ((x) = PERF_SAMPLE_BRANCH_USER; \
+ (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
+
 /*
  * This struct provides the constants and functions needed to
  * describe the PMU on a particular POWER-family CPU.
@@ -35,7 +39,7 @@ struct power_pmu {
unsigned long *valp);
int (*get_alternatives)(u64 event_id, unsigned int flags,
u64 alt[]);
-   u64 (*bhrb_filter_map)(u64 branch_sample_type);
+   u64 (*bhrb_filter_map)(u64 branch_sample_type, u64 
*bhrb_filter);
void(*config_bhrb)(u64 pmu_bhrb_filter);
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);

[V6 10/11] power8, perf: Adapt BHRB PMU configuration to work with SW filters

2014-05-21 Thread Anshuman Khandual

Powerpc kernel now supports SW based branch filters for book3s systems with some
specifc requirements while dealing with HW supported branch filters in order to
achieve overall OR semantics prevailing in perf branch stack sampling framework.
This patch adapts the BHRB branch filter configuration to meet those protocols.
POWER8 PMU can only handle one HW based branch filter request at any point of 
time.
For all other combinations PMU will pass it on to the SW.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/perf/power8-pmu.c | 50 --
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 699b1dd..4743bde 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -635,6 +635,16 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
 {
+   u64 x, pmu_bhrb_filter;
+   pmu_bhrb_filter = 0;
+   *bhrb_filter = 0;
+
+   /* No branch filter requested */
+   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+   *bhrb_filter = PERF_SAMPLE_BRANCH_ANY;
+   return pmu_bhrb_filter;
+   }
+
/* BHRB and regular PMU events share the same privilege state
 * filter configuration. BHRB is always recorded along with a
 * regular PMU event. As the privilege state filter is handled
@@ -645,16 +655,42 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, 
u64 *bhrb_filter)
/* Ignore user, kernel, hv bits */
branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
 
-   /* No branch filter requested */
-   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
-   return 0;
+   /*
+* P8 does not support oring of PMU HW branch filters. Hence
+* if multiple branch filters are requested which includes filters
+* supported in PMU, still go ahead and clear the PMU based HW branch
+* filter component as in this case all the filters will be processed
+* in SW.
+*/
 
-   if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
-   return POWER8_MMCRA_IFM1;
+   for_each_branch_sample_type(x) {
+   /* Ignore privilege branch filters */
+   if ((x == PERF_SAMPLE_BRANCH_USER)
+   || (x == PERF_SAMPLE_BRANCH_KERNEL)
+   || (x == PERF_SAMPLE_BRANCH_HV))
+   continue;
+
+   if (!(branch_sample_type & x))
+   continue;
+
+   /* Supported individual PMU branch filters */
+   if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
+   branch_sample_type &= ~PERF_SAMPLE_BRANCH_ANY_CALL;
+   if (branch_sample_type) {
+   /* Multiple branch filters will be processed in 
SW */
+   pmu_bhrb_filter = 0;
+   *bhrb_filter = 0;
+   return pmu_bhrb_filter;
+   } else {
+   /* Individual branch filter will be processed 
in PMU */
+   pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
+   *bhrb_filter|= PERF_SAMPLE_BRANCH_ANY_CALL;
+   return pmu_bhrb_filter;
+   }
+   }
}
 
-   /* Every thing else is unsupported */
-   return -1;
+   return pmu_bhrb_filter;
 }
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V6 02/11] perf, tool: Conditional branch filter 'cond' added to perf record

2014-05-21 Thread Anshuman Khandual

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8ce62ef..dfe6b9d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX),
BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX),
BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX),
+   BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
BRANCH_END
 };
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: [V6 00/11] perf: New conditional branch filter

2014-05-21 Thread Anshuman Khandual

On 05/21/2014 02:53 PM, Peter Zijlstra wrote:
> On Wed, May 21, 2014 at 02:41:58PM +0530, Anshuman Khandual wrote:
>> Hello Peter/Ingo,
>>
>> Would you please consider reviewing the first four patches in this patch 
>> series
>> which changes the generic perf kernel and perf tools code. Andi Kleen and 
>> Stephane
>> Eranian have already reviewed these changes. The rest of the patch series is 
>> related
>> to powerpc and being reviewed by Michael Ellerman/Ben.
>>
> 
> If they land in my inbox I might have a look.
> 

Sent.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND

2014-05-21 Thread Anshuman Khandual

On 05/21/2014 05:00 PM, Peter Zijlstra wrote:
> On Wed, May 21, 2014 at 03:29:46PM +0530, Anshuman Khandual wrote:
>> This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
>> will extend the existing perf ABI. Various architectures can provide
>> this functionality with either with HW filtering support (if present)
>> or with SW filtering of captured branch instructions.
> 
> The Changelog fails to mention what _this_ functionality is.
> 

Peter,

Hope this new change log below makes more sense.

---
commit af75191bb7ad36cba7d75c2741c93dfbdaf09da3
Author: Anshuman Khandual 
Date:   Mon Jul 22 12:22:27 2013 +0530

perf: Add new conditional branch filter PERF_SAMPLE_BRANCH_COND

This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. This will filter branches which are
conditional. Various architectures can provide this functionality either
with HW filtering support (if present) or with SW filtering of captured
branch instructions.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 853bc1c..696f69b4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_IN_TX= 1U << 8, /* in transaction */
PERF_SAMPLE_BRANCH_NO_TX= 1U << 9, /* not in transaction */
+   PERF_SAMPLE_BRANCH_COND = 1U << 10, /* conditional branches */
 
-   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
+   PERF_SAMPLE_BRANCH_MAX  = 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: [V6 00/11] perf: New conditional branch filter

2014-05-21 Thread Anshuman Khandual

On 05/21/2014 05:31 PM, Peter Zijlstra wrote:
> On Wed, May 21, 2014 at 04:09:55PM +0530, Anshuman Khandual wrote:
>> On 05/21/2014 02:53 PM, Peter Zijlstra wrote:
>>> On Wed, May 21, 2014 at 02:41:58PM +0530, Anshuman Khandual wrote:
>>>> Hello Peter/Ingo,
>>>>
>>>> Would you please consider reviewing the first four patches in this patch 
>>>> series
>>>> which changes the generic perf kernel and perf tools code. Andi Kleen and 
>>>> Stephane
>>>> Eranian have already reviewed these changes. The rest of the patch series 
>>>> is related
>>>> to powerpc and being reviewed by Michael Ellerman/Ben.
>>>>
>>>
>>> If they land in my inbox I might have a look.
>>>
>>
>> Sent.
> 
> Thanks, they look fine to me, although 1/x can use a lightly longer
> changelog, making it explicit its a filter for conditional branches.
> 
> How do people want this routed? Should I take all patches through tip,
> or do I ask Ingo to create a special perf/cond branch which includes the
> first 4 patches which can be merged into whatever ppc branch and the
> rest then go on top in the ppc tree?
> 

Peter,

Thanks for considering the patchset.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 2/3] powerpc, ptrace: Enable support for transactional memory register sets

2014-05-21 Thread Anshuman Khandual

On 05/20/2014 04:03 PM, Pedro Alves wrote:
> On 05/20/2014 09:14 AM, Anshuman Khandual wrote:
>> On 05/19/2014 08:13 PM, Pedro Alves wrote:
>>> On 05/19/2014 12:46 PM, Anshuman Khandual wrote:
>>>
>>>>>> I couldn't actually find any arch that currently returns -ENODEV in
>>>>>> the "active" hook.  I see that binfmt_elf.c doesn't handle
>>>>>> regset->active() returning < 0.  Guess that may be why.  Looks like
>>>>>> something that could be cleaned up, to me.
>>>>>>
>>>> Also it does not consider the return value of regset->active(t->task, 
>>>> regset)
>>>> (whose objective is to figure out whether we need to request regset->n 
>>>> number
>>>> of elements or less than that) in the subsequent call to regset->get 
>>>> function.
>>>
>>> Indeed.
>>>
>>> TBC, do you plan on fixing this?  Otherwise ...
>>
>> Sure, thinking something like this as mentioned below. But still not sure 
>> how to use
>> the return type of -ENODEV from the function regset->active(). Right now if 
>> any
>> regset does have the active hook and it returns anything but positive value, 
>> it will
>> be ignored and the control moves to the next regset in view. This prevents 
>> the thread
>> core note type being written to the core dump.
> 
> Looks to me that that's exactly what should happen for -ENODEV too.  The 
> regset
> should be ignored.  If regset->active() returns -ENODEV, then the machine
> doesn't have the registers at all, so what makes sense to me is to not write 
> the
> corresponding core note in the dump.  IOW, on such a machine, the kernel
> generates a core exactly like if the support for these registers that don't
> make sense for this machine wasn't compiled in at all.  And generates a core
> exactly like an older kernel that didn't know about that regset
> (which is fine for that same machine) yet.
> 

All of this happen right now even without specifically checking for the return 
type
of -ENODEV and just checking for a positive value. I guess thats the reason 
they had
omitted -ENODEV in the first place. 

 
>>
>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> index aa3cb62..80672fb 100644
>> --- a/fs/binfmt_elf.c
>> +++ b/fs/binfmt_elf.c
>> @@ -1553,7 +1553,15 @@ static int fill_thread_core_info(struct 
>> elf_thread_core_info *t,
>> if (regset->core_note_type && regset->get &&
>> (!regset->active || regset->active(t->task, regset))) {
>> int ret;
> 
> So, here, this ?
> 
> (!regset->active || regset->active(t->task, regset) > 0)) 
> {
> 
> 
>> -   size_t size = regset->n * regset->size;
>> +   size_t size;
>> +
>> +   /* Request only the active elements in the regset */
>> +   if (!regset->active)
>> +   size = regset->n * regset->size;
>> +   else
>> +   size = regset->active(t->task, regset)
>> +   * 
>> regset->size;
>> +
> 
> 
> I wonder if it wouldn't be cleaner to add a function like:
> 
> int
> regset_active (tast *task, regseg *regset)
> {
>if (!regset->active)
> return regset->n * regset->size;
>else
> return regset->active(task, regset);
> }
> 
> And then use it like
> 
>if (regset->core_note_type && regset->get) {
>int size = regset_active (t->task, regset);
> 
>if (size > 0) {
>   ...
>}
> 

Yeah this makes sense.

> Though at this point, we don't actually make use of
> the distinction between -ENODEV vs 0.  Guess that's what
> we should be thinking about.  Seems like there some details that
> need to be sorted out, and some verification that consumers aren't
> broken by outputting smaller notes -- e.g., ia64 makes me
> wonder that.

I agree.

> 
> Maybe we should leave this for another day, and have tm_spr_active
> return 0 instead of -ENODEV when the machine doesn't have the hardware,
> or not install that hook at all.  Seems like the effect will be the same,
> as the note isn't output if ->get fails.

Agree. Active

Re: [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND

2014-05-22 Thread Anshuman Khandual

On 05/22/2014 12:31 PM, Peter Zijlstra wrote:
> On Thu, May 22, 2014 at 09:18:54AM +0530, Anshuman Khandual wrote:
> 
>> Hope this new change log below makes more sense.
> 
> Yep reads a whole lot better. Thanks.
> 

Will resend the first four patches with this new commit messages without
changing the overall version of the patchset. Hope thats okay.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V6 2/4] perf, tool: Conditional branch filter 'cond' added to perf record

2014-05-22 Thread Anshuman Khandual

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual 
Reviewed-by: Stephane Eranian 
Reviewed-by: Andi Kleen 
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8ce62ef..dfe6b9d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX),
BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX),
BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX),
+   BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
BRANCH_END
 };
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 1964 matches

Mail list logo