[PATCH v4] mm: remove odd HAVE_PTE_SPECIAL

2018-04-12 Thread Laurent Dufour
Remove the additional define HAVE_PTE_SPECIAL and rely directly on
CONFIG_ARCH_HAS_PTE_SPECIAL.

There is no functional change introduced by this patch

Signed-off-by: Laurent Dufour 
---
 mm/memory.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 96910c625daa..345e562a138d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -817,17 +817,12 @@ static void print_bad_pte(struct vm_area_struct *vma, 
unsigned long addr,
  * PFNMAP mappings in order to support COWable mappings.
  *
  */
-#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
-# define HAVE_PTE_SPECIAL 1
-#else
-# define HAVE_PTE_SPECIAL 0
-#endif
 struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 pte_t pte, bool with_public_device)
 {
unsigned long pfn = pte_pfn(pte);
 
-   if (HAVE_PTE_SPECIAL) {
+   if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
if (likely(!pte_special(pte)))
goto check_pfn;
if (vma->vm_ops && vma->vm_ops->find_special_page)
@@ -862,7 +857,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
return NULL;
}
 
-   /* !HAVE_PTE_SPECIAL case follows: */
+   /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */
 
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
@@ -881,6 +876,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
 
if (is_zero_pfn(pfn))
return NULL;
+
 check_pfn:
if (unlikely(pfn > highest_memmap_pfn)) {
print_bad_pte(vma, addr, pte, NULL);
@@ -904,7 +900,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, 
unsigned long addr,
/*
 * There is no pmd_special() but there may be special pmds, e.g.
 * in a direct-access (dax) mapping, so let's just replicate the
-* !HAVE_PTE_SPECIAL case from vm_normal_page() here.
+* !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here.
 */
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
@@ -1933,7 +1929,8 @@ static int __vm_insert_mixed(struct vm_area_struct *vma, 
unsigned long addr,
 * than insert_pfn).  If a zero_pfn were inserted into a VM_MIXEDMAP
 * without pte special, it would there be refcounted as a normal page.
 */
-   if (!HAVE_PTE_SPECIAL && !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
+   if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) &&
+   !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
struct page *page;
 
/*
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 1/4] arm64: KVM: export the capability to set guest SError syndrome

2018-04-12 Thread gengdongjiu
HI James,
  Thanks for the review.


2018-04-10 22:15 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> Before user space injects a SError, it needs to know whether it can
>> specify the guest Exception Syndrome, so KVM should tell user space
>> whether it has such capability.
>
> (you could improve the commit message by briefly explaining how/why
> user-space
> would want to do this. As this is patch 1, you don't have the context of
> the
> previous patch to say that some systems can provide an ESR with
> virtual-SError)
Exactly, thanks for the good comments.

>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index fc3ae95..8a3d708 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4415,3 +4415,14 @@ Parameters: none
>>  This capability indicates if the flic device will be able to get/set the
>>  AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and
>> allows
>>  to discover this without having to create a flic device.
>> +
>> +8.14 KVM_CAP_ARM_SET_SERROR_ESR
>> +
>> +Architectures: arm, arm64
>> +
>> +This capability indicates that userspace can specify syndrome value
>> reported to
>
> (Nit: 'the syndrome value')
will fix it.

>
>> +guest OS when guest takes a virtual SError interrupt exception.
>
> (Nit: 'the guest')
will fix it.

>
>> +If KVM has this capability, userspace can only specify the ISS field for
>> the ESR
>> +syndrome, can not specify the EC field which is not under control by
>> KVM.
>
> (Nit: 'it can not specify...')
will fix it.

>
>> +If this virtual SError is taken to EL1 using AArch64, this value will be
>> reported
>> +into ISS filed of ESR_EL1.
>
> (Nit: 'in the ISS field')
will fix it.

>
>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index 3256b92..38c8a64 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm,
>> long ext)
>>  case KVM_CAP_ARM_PMU_V3:
>>  r = kvm_arm_support_pmu_v3();
>>  break;
>> +case KVM_CAP_ARM_INJECT_SERROR_ESR:
>> +r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
>> +break;
>>  case KVM_CAP_SET_GUEST_DEBUG:
>>  case KVM_CAP_VCPU_ATTRIBUTES:
>>  r = 1;
>
> 'dev_ioctl' feels a bit weird, but we already have cpu_has_32bit_el1() in
> here.

Yes, although the name is "dev_ioctl", it does not have relationship
with the device.
here it mainly check vcpu capability, such as PMU, 32bit EL1 etc.

>
>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 8fb90a0..3587b33 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -934,6 +934,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_S390_AIS_MIGRATION 150
>>  #define KVM_CAP_PPC_GET_CPU_CHAR 151
>>  #define KVM_CAP_S390_BPB 152
>> +#define KVM_CAP_ARM_INJECT_SERROR_ESR 153
>>
>>  #ifdef KVM_CAP_IRQ_ROUTING
>
> (patch 1&2 should probably be swapped around, as on its own this does
> thing).
ok, I will do it.

>
> Reviewed-by: James Morse 
thanks this Reviewed-by

>
>
> Thanks,
>
> James
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] Disable instrumentation for some code

2018-04-12 Thread Abbott Liu
From: Andrey Ryabinin 

Disable instrumentation for arch/arm/boot/compressed/*
and arch/arm/vdso/* because those code won't linkd with
kernel image.

Disable instrumentation for arch/arm/kvm/hyp/*. See commit a6cdf1c08cbf
("kvm: arm64: Disable compiler instrumentation for hypervisor code")
for more details.

Disable instrumentation for arch/arm/mm/physaddr.c. See
commit ec6d06efb0ba ("arm64: Add support for CONFIG_DEBUG_VIRTUAL")
for more details.

Disable kasan check in the function unwind_pop_register
because it doesn't matter that kasan checks failed when
unwind_pop_register read stack memory of task.

Reported-by: Florian Fainelli 
Reported-by: Marc Zyngier 
Tested-by: Joel Stanley 
Tested-by: Florian Fainelli 
Tested-by: Abbott Liu 
Signed-off-by: Abbott Liu 
---
 arch/arm/boot/compressed/Makefile | 1 +
 arch/arm/kernel/unwind.c  | 3 ++-
 arch/arm/kvm/hyp/Makefile | 4 
 arch/arm/mm/Makefile  | 1 +
 arch/arm/vdso/Makefile| 2 ++
 5 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/compressed/Makefile 
b/arch/arm/boot/compressed/Makefile
index 45a6b9b..966103e 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,7 @@ OBJS  += hyp-stub.o
 endif
 
 GCOV_PROFILE   := n
+KASAN_SANITIZE := n
 
 #
 # Architecture dependencies
diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index 0bee233..2e55c7d 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -249,7 +249,8 @@ static int unwind_pop_register(struct unwind_ctrl_block 
*ctrl,
if (*vsp >= (unsigned long *)ctrl->sp_high)
return -URC_FAILURE;
 
-   ctrl->vrs[reg] = *(*vsp)++;
+   ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
+   (*vsp)++;
return URC_OK;
 }
 
diff --git a/arch/arm/kvm/hyp/Makefile b/arch/arm/kvm/hyp/Makefile
index 63d6b40..0a8b500 100644
--- a/arch/arm/kvm/hyp/Makefile
+++ b/arch/arm/kvm/hyp/Makefile
@@ -24,3 +24,7 @@ obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o
 obj-$(CONFIG_KVM_ARM_HOST) += switch.o
 CFLAGS_switch.o   += $(CFLAGS_ARMV7VE)
 obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o
+
+GCOV_PROFILE   := n
+KASAN_SANITIZE := n
+UBSAN_SANITIZE := n
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 9dbb849..c056e17 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -16,6 +16,7 @@ endif
 obj-$(CONFIG_ARM_PTDUMP_CORE)  += dump.o
 obj-$(CONFIG_ARM_PTDUMP_DEBUGFS)   += ptdump_debugfs.o
 obj-$(CONFIG_MODULES)  += proc-syms.o
+KASAN_SANITIZE_physaddr.o  := n
 obj-$(CONFIG_DEBUG_VIRTUAL)+= physaddr.o
 
 obj-$(CONFIG_ALIGNMENT_TRAP)   += alignment.o
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index bb411821..87abbb7 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -30,6 +30,8 @@ CFLAGS_vgettimeofday.o = -O2
 # Disable gcov profiling for VDSO code
 GCOV_PROFILE := n
 
+KASAN_SANITIZE := n
+
 # Force dependency
 $(obj)/vdso.o : $(obj)/vdso.so
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] Replace memory function for kasan

2018-04-12 Thread Abbott Liu
From: Andrey Ryabinin 

Functions like memset/memmove/memcpy do a lot of memory accesses.
If bad pointer passed to one of these function it is important
to catch this. Compiler's instrumentation cannot do this since
these functions are written in assembly.

KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions
in mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in name, so we could call non-instrumented variant
if needed.

We must use __memcpy/__memset to replace memcpy/memset when we copy
.data to RAM and when we clear .bss, because kasan_early_init can't
be called before the initialization of .data and .bss.

Reported-by: Russell King - ARM Linux 
Acked-by: Florian Fainelli 
Tested-by: Florian Fainelli 
Tested-by: Joel Stanley 
Tested-by: Abbott Liu 
Signed-off-by: Abbott Liu 
---
 arch/arm/boot/compressed/decompress.c |  2 ++
 arch/arm/boot/compressed/libfdt_env.h |  2 ++
 arch/arm/include/asm/string.h | 17 +
 arch/arm/kernel/head-common.S |  4 ++--
 arch/arm/lib/memcpy.S |  3 +++
 arch/arm/lib/memmove.S|  5 -
 arch/arm/lib/memset.S |  3 +++
 7 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/arm/boot/compressed/decompress.c 
b/arch/arm/boot/compressed/decompress.c
index a2ac3fe..0596077 100644
--- a/arch/arm/boot/compressed/decompress.c
+++ b/arch/arm/boot/compressed/decompress.c
@@ -49,8 +49,10 @@ extern int memcmp(const void *cs, const void *ct, size_t 
count);
 #endif
 
 #ifdef CONFIG_KERNEL_XZ
+#ifndef CONFIG_KASAN
 #define memmove memmove
 #define memcpy memcpy
+#endif
 #include "../../../../lib/decompress_unxz.c"
 #endif
 
diff --git a/arch/arm/boot/compressed/libfdt_env.h 
b/arch/arm/boot/compressed/libfdt_env.h
index 0743781..736ed36 100644
--- a/arch/arm/boot/compressed/libfdt_env.h
+++ b/arch/arm/boot/compressed/libfdt_env.h
@@ -17,4 +17,6 @@ typedef __be64 fdt64_t;
 #define fdt64_to_cpu(x)be64_to_cpu(x)
 #define cpu_to_fdt64(x)cpu_to_be64(x)
 
+#undef memset
+
 #endif
diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h
index 111a1d8..1f9016b 100644
--- a/arch/arm/include/asm/string.h
+++ b/arch/arm/include/asm/string.h
@@ -15,15 +15,18 @@ extern char * strchr(const char * s, int c);
 
 #define __HAVE_ARCH_MEMCPY
 extern void * memcpy(void *, const void *, __kernel_size_t);
+extern void *__memcpy(void *dest, const void *src, __kernel_size_t n);
 
 #define __HAVE_ARCH_MEMMOVE
 extern void * memmove(void *, const void *, __kernel_size_t);
+extern void *__memmove(void *dest, const void *src, __kernel_size_t n);
 
 #define __HAVE_ARCH_MEMCHR
 extern void * memchr(const void *, int, __kernel_size_t);
 
 #define __HAVE_ARCH_MEMSET
 extern void * memset(void *, int, __kernel_size_t);
+extern void *__memset(void *s, int c, __kernel_size_t n);
 
 #define __HAVE_ARCH_MEMSET32
 extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
@@ -39,4 +42,18 @@ static inline void *memset64(uint64_t *p, uint64_t v, 
__kernel_size_t n)
return __memset64(p, v, n * 8, v >> 32);
 }
 
+
+
+#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
+
+/*
+ * For files that not instrumented (e.g. mm/slub.c) we
+ * should use not instrumented version of mem* functions.
+ */
+
+#define memcpy(dst, src, len) __memcpy(dst, src, len)
+#define memmove(dst, src, len) __memmove(dst, src, len)
+#define memset(s, c, n) __memset(s, c, n)
+#endif
+
 #endif
diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
index 6e0375e..c79b829 100644
--- a/arch/arm/kernel/head-common.S
+++ b/arch/arm/kernel/head-common.S
@@ -99,7 +99,7 @@ __mmap_switched:
  THUMB(ldmia   r4!, {r0, r1, r2, r3} )
  THUMB(mov sp, r3 )
sub r2, r2, r1
-   bl  memcpy  @ copy .data to RAM
+   bl  __memcpy@ copy .data to RAM
 #endif
 
ARM(ldmia   r4!, {r0, r1, sp} )
@@ -107,7 +107,7 @@ __mmap_switched:
  THUMB(mov sp, r3 )
sub r2, r1, r0
mov r1, #0
-   bl  memset  @ clear .bss
+   bl  __memset@ clear .bss
 
ldmia   r4, {r0, r1, r2, r3}
str r9, [r0]@ Save processor ID
diff --git a/arch/arm/lib/memcpy.S b/arch/arm/lib/memcpy.S
index 64111bd..79a83f8 100644
--- a/arch/arm/lib/memcpy.S
+++ b/arch/arm/lib/memcpy.S
@@ -61,6 +61,8 @@
 
 /* Prototype: void *memcpy(void *dest, const void *src, size_t n); */
 
+.weak memcpy
+ENTRY(__memcpy)
 ENTRY(mmiocpy)
 ENTRY(memcpy)
 
@@ -68,3 +70,4 @@ ENTRY(memcpy)
 
 ENDPROC(memcpy)
 ENDPROC(mmiocpy)
+ENDPROC(__memcpy)
diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S
index 69a9d47..313db6c 100644
--- a/arch/arm/lib/memmove.S
+++ b/arch/arm/lib/memmove.S
@@ -27,12 +27,14 

[PATCH 4/6] Define the virtual space of KASan's shadow region

2018-04-12 Thread Abbott Liu
Define KASAN_SHADOW_OFFSET,KASAN_SHADOW_START and KASAN_SHADOW_END for arm
kernel address sanitizer.

 ++ 0x
 ||
 ||
 ||
 ++ CONFIG_PAGE_OFFSET
 || || |->  module virtual address space area.
 ||/
 ++ MODULE_VADDR = KASAN_SHADOW_END
 || || |-> the shadow area of kernel virtual address.
 ||/
 ++ TASK_SIZE(start of kernel space) = KASAN_SHADOW_START  the
 ||\  shadow address of MODULE_VADDR
 || -+
 ||  |
 ++ KASAN_SHADOW_OFFSET  |-> the user space area. Kernel address
 ||  |sanitizer do not use this space.
 || -+
 ||/
 -- 0

1)KASAN_SHADOW_OFFSET:
  This value is used to map an address to the corresponding shadow
address by the following formula:
shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET;

2)KASAN_SHADOW_START
  This value is the MODULE_VADDR's shadow address. It is the start
of kernel virtual space.

3)KASAN_SHADOW_END
  This value is the 0x1's shadow address. It is the end of
kernel addresssanitizer's shadow area. It is also the start of the
module area.

When enable kasan, the definition of TASK_SIZE is not an an 8-bit
rotated constant, so we need to modify the TASK_SIZE access code
in the *.s file.

Cc: Andrey Ryabinin 
Reported-by: Ard Biesheuvel 
Tested-by: Joel Stanley 
Tested-by: Florian Fainelli 
Tested-by: Abbott Liu 
Signed-off-by: Abbott Liu 
---
 arch/arm/include/asm/kasan_def.h | 64 
 arch/arm/include/asm/memory.h|  5 
 arch/arm/kernel/entry-armv.S |  5 ++--
 arch/arm/kernel/entry-common.S   |  9 --
 arch/arm/mm/init.c   |  6 
 arch/arm/mm/mmu.c|  7 -
 6 files changed, 90 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm/include/asm/kasan_def.h

diff --git a/arch/arm/include/asm/kasan_def.h b/arch/arm/include/asm/kasan_def.h
new file mode 100644
index 000..7b7f424
--- /dev/null
+++ b/arch/arm/include/asm/kasan_def.h
@@ -0,0 +1,64 @@
+/*
+ *  arch/arm/include/asm/kasan_def.h
+ *
+ *  Copyright (c) 2018 Huawei Technologies Co., Ltd.
+ *
+ *  Author: Abbott Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_KASAN_DEF_H
+#define __ASM_KASAN_DEF_H
+
+#ifdef CONFIG_KASAN
+
+/*
+ *++ 0x
+ *||
+ *||
+ *||
+ *++ CONFIG_PAGE_OFFSET
+ *||\
+ *|| |->  module virtual address space area.
+ *||/
+ *++ MODULE_VADDR = KASAN_SHADOW_END
+ *||\
+ *|| |-> the shadow area of kernel virtual address.
+ *||/
+ *++ TASK_SIZE(start of kernel space) = KASAN_SHADOW_START  the
+ *||\  shadow address of MODULE_VADDR
+ *|| -+
+ *||  |
+ *++ KASAN_SHADOW_OFFSET  |-> the user space area. Kernel address
+ *||  |sanitizer do not use this space.
+ *|| -+
+ *||/
+ *-- 0
+ *
+ *1)KASAN_SHADOW_OFFSET:
+ *This value is used to map an address to the corresponding shadow
+ * address by the following formula:
+ * shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET;
+ *
+ * 2)KASAN_SHADOW_START
+ * This value is the MODULE_VADDR's shadow address. It is the start
+ * of kernel virtual space.
+ *
+ * 3) KASAN_SHADOW_END
+ *   This value is the 0x1's shadow address. It is the end of
+ * kernel addresssanitizer's shadow area. It is also the start of the
+ * module area.
+ *
+ */
+
+#define KASAN_SHADOW_OFFSET (KASAN_SHADOW_END - (1<<29))
+
+#define KASAN_SHADOW_START  ((KASAN_SHADOW_END >> 3) + KASAN_SHADOW_OFFSET)
+
+#define KASAN_SHADOW_END(UL(CONFIG_PAGE_OFFSET) - UL(SZ_16M))
+
+#endif
+#endif
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 4966677..3ce1a9a 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -21,6 +21,7 @@
 #ifdef CONFIG_NEED_MACH_MEMORY_H
 #include 
 #endif
+#include 
 
 /*
  * Allow for constants defined here to be used from assembly code
@@ -37,7 +38,11 @@
  * TASK_SIZE - the maximum size of a user space task.
  * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area
  */
+#ifndef CONFIG_KASAN
 #define TASK_SIZE  (UL(CONFIG_PAGE_OFFSET) - UL(SZ_16M))
+#else
+#define TASK_SIZE  (KASAN_SHADOW_START)
+#endif
 #define TASK_UNMAPPED_BASE ALIGN(TASK_SIZE / 3, SZ_16M)
 
 /*
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 1752033..b4de9e4 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -183,7 +183,7 @@ ENDPROC(__und_invalid)
 
get_thread

[PATCH 5/6] Initialize the mapping of KASan shadow memory

2018-04-12 Thread Abbott Liu
From: Andrey Ryabinin 

This patch initializes KASan shadow region's page table and memory.
There are two stage for KASan initializing:
1. At early boot stage the whole shadow region is mapped to just
   one physical page (kasan_zero_page). It's finished by the function
   kasan_early_init which is called by __mmap_switched(arch/arm/kernel/
   head-common.S)
 ---Andrey Ryabinin 

2. After the calling of paging_init, we use kasan_zero_page as zero
   shadow for some memory that KASan don't need to track, and we alloc
   new shadow space for the other memory that KASan need to track. These
   issues are finished by the function kasan_init which is call by
   setup_arch.
---Andrey Ryabinin 

3. Add support arm LPAE
   If LPAE is enabled, KASan shadow region's mapping table need be copyed
   in pgd_alloc function.
---Abbott Liu 

4. Change kasan_pte_populate,kasan_pmd_populate,kasan_pud_populate,
   kasan_pgd_populate from .meminit.text section to .init.text section.
   ---Reported by: Florian Fainelli 
   ---Signed off by: Abbott Liu 

Cc: Andrey Ryabinin 
Co-Developed-by: Abbott Liu 
Reported-by: Russell King - ARM Linux 
Reported-by: Florian Fainelli 
Tested-by: Florian Fainelli 
Tested-by: Joel Stanley 
Tested-by: Abbott Liu 
Signed-off-by: Abbott Liu 
---
 arch/arm/include/asm/kasan.h   |  35 +
 arch/arm/include/asm/pgalloc.h |   7 +-
 arch/arm/include/asm/thread_info.h |   4 +
 arch/arm/kernel/head-common.S  |   3 +
 arch/arm/kernel/setup.c|   2 +
 arch/arm/mm/Makefile   |   3 +
 arch/arm/mm/kasan_init.c   | 302 +
 arch/arm/mm/pgd.c  |  14 ++
 8 files changed, 368 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm/include/asm/kasan.h
 create mode 100644 arch/arm/mm/kasan_init.c

diff --git a/arch/arm/include/asm/kasan.h b/arch/arm/include/asm/kasan.h
new file mode 100644
index 000..1801f4d
--- /dev/null
+++ b/arch/arm/include/asm/kasan.h
@@ -0,0 +1,35 @@
+/*
+ * arch/arm/include/asm/kasan.h
+ *
+ * Copyright (c) 2015 Samsung Electronics Co., Ltd.
+ * Author: Andrey Ryabinin 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#ifndef __ASM_KASAN_H
+#define __ASM_KASAN_H
+
+#ifdef CONFIG_KASAN
+
+#include 
+
+#define KASAN_SHADOW_SCALE_SHIFT 3
+
+/*
+ * Compiler uses shadow offset assuming that addresses start
+ * from 0. Kernel addresses don't start from 0, so shadow
+ * for kernel really starts from 'compiler's shadow offset' +
+ * ('kernel address space start' >> KASAN_SHADOW_SCALE_SHIFT)
+ */
+
+extern void kasan_init(void);
+
+#else
+static inline void kasan_init(void) { }
+#endif
+
+#endif
diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index 2d7344f..f170659 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -50,8 +50,11 @@ static inline void pud_populate(struct mm_struct *mm, pud_t 
*pud, pmd_t *pmd)
  */
 #define pmd_alloc_one(mm,addr) ({ BUG(); ((pmd_t *)2); })
 #define pmd_free(mm, pmd)  do { } while (0)
-#define pud_populate(mm,pmd,pte)   BUG()
-
+#ifndef CONFIG_KASAN
+#define pud_populate(mm, pmd, pte) BUG()
+#else
+#define pud_populate(mm, pmd, pte) do { } while (0)
+#endif
 #endif /* CONFIG_ARM_LPAE */
 
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
diff --git a/arch/arm/include/asm/thread_info.h 
b/arch/arm/include/asm/thread_info.h
index e71cc35..bc681a0 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -16,7 +16,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_KASAN
+#define THREAD_SIZE_ORDER  2
+#else
 #define THREAD_SIZE_ORDER  1
+#endif
 #define THREAD_SIZE(PAGE_SIZE << THREAD_SIZE_ORDER)
 #define THREAD_START_SP(THREAD_SIZE - 8)
 
diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
index c79b829..20161e2 100644
--- a/arch/arm/kernel/head-common.S
+++ b/arch/arm/kernel/head-common.S
@@ -115,6 +115,9 @@ __mmap_switched:
str r8, [r2]@ Save atags pointer
cmp r3, #0
strne   r10, [r3]   @ Save control register values
+#ifdef CONFIG_KASAN
+   bl  kasan_early_init
+#endif
mov lr, #0
b   start_kernel
 ENDPROC(__mmap_switched)
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index fc40a2b..81c3e9df 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "atags.h"
 
@@ -1118,6 +1119,7 @@ void __init setup_arch(char **cmdline_p)
early_ioremap_reset();
 
paging_init(mdesc);
+   kasan_init();
request_standard_resources(mdesc);
 
if (mdesc->restart)
diff --gi

[PATCH 6/6] Enable KASan for arm

2018-04-12 Thread Abbott Liu
From: Andrey Ryabinin 

This patch enable kernel address sanitizer for arm.

Cc: Andrey Ryabinin 
Acked-by: Dmitry Vyukov 
Tested-by: Joel Stanley 
Tested-by: Florian Fainelli 
Tested-by: Abbott Liu 
Signed-off-by: Abbott Liu 
---
 Documentation/dev-tools/kasan.rst | 2 +-
 arch/arm/Kconfig  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index f7a18f2..d92120d 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -12,7 +12,7 @@ KASAN uses compile-time instrumentation for checking every 
memory access,
 therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
 required for detection of out-of-bounds accesses to stack or global variables.
 
-Currently KASAN is supported only for the x86_64 and arm64 architectures.
+Currently KASAN is supported only for the x86_64, arm64 and arm architectures.
 
 Usage
 -
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 1878083..cd71bea 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -49,6 +49,7 @@ config ARM
select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
+   select HAVE_ARCH_KASAN if MMU
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] Add TTBR operator for kasan_init

2018-04-12 Thread Abbott Liu
The purpose of this patch is to provide set_ttbr0/get_ttbr0
to kasan_init function. The definitions of cp15 registers
should be in arch/arm/include/asm/cp15.h rather than
arch/arm/include/asm/kvm_hyp.h, so move them.

Cc: Andrey Ryabinin 
Reported-by: Marc Zyngier 
Acked-by: Mark Rutland 
Tested-by: Florian Fainelli 
Tested-by: Joel Stanley 
Tested-by: Abbott Liu 
Signed-off-by: Abbott Liu 
---
 arch/arm/include/asm/cp15.h| 104 +
 arch/arm/include/asm/kvm_hyp.h |  52 -
 arch/arm/kvm/hyp/cp15-sr.c |  12 ++---
 arch/arm/kvm/hyp/switch.c  |   6 +--
 4 files changed, 113 insertions(+), 61 deletions(-)

diff --git a/arch/arm/include/asm/cp15.h b/arch/arm/include/asm/cp15.h
index 4c9fa72..99ebb31 100644
--- a/arch/arm/include/asm/cp15.h
+++ b/arch/arm/include/asm/cp15.h
@@ -3,6 +3,7 @@
 #define __ASM_ARM_CP15_H
 
 #include 
+#include 
 
 /*
  * CR1 bits (CP#15 CR1)
@@ -65,8 +66,111 @@
 #define __write_sysreg(v, r, w, c, t)  asm volatile(w " " c : : "r" ((t)(v)))
 #define write_sysreg(v, ...)   __write_sysreg(v, __VA_ARGS__)
 
+#define TTBR0_32   __ACCESS_CP15(c2, 0, c0, 0)
+#define TTBR1_32   __ACCESS_CP15(c2, 0, c0, 1)
+#define PAR_32 __ACCESS_CP15(c7, 0, c4, 0)
+#define TTBR0_64   __ACCESS_CP15_64(0, c2)
+#define TTBR1_64   __ACCESS_CP15_64(1, c2)
+#define PAR_64 __ACCESS_CP15_64(0, c7)
+#define VTTBR  __ACCESS_CP15_64(6, c2)
+#define CNTV_CVAL  __ACCESS_CP15_64(3, c14)
+#define CNTVOFF__ACCESS_CP15_64(4, c14)
+
+#define MIDR   __ACCESS_CP15(c0, 0, c0, 0)
+#define CSSELR __ACCESS_CP15(c0, 2, c0, 0)
+#define VPIDR  __ACCESS_CP15(c0, 4, c0, 0)
+#define VMPIDR __ACCESS_CP15(c0, 4, c0, 5)
+#define SCTLR  __ACCESS_CP15(c1, 0, c0, 0)
+#define CPACR  __ACCESS_CP15(c1, 0, c0, 2)
+#define HCR__ACCESS_CP15(c1, 4, c1, 0)
+#define HDCR   __ACCESS_CP15(c1, 4, c1, 1)
+#define HCPTR  __ACCESS_CP15(c1, 4, c1, 2)
+#define HSTR   __ACCESS_CP15(c1, 4, c1, 3)
+#define TTBCR  __ACCESS_CP15(c2, 0, c0, 2)
+#define HTCR   __ACCESS_CP15(c2, 4, c0, 2)
+#define VTCR   __ACCESS_CP15(c2, 4, c1, 2)
+#define DACR   __ACCESS_CP15(c3, 0, c0, 0)
+#define DFSR   __ACCESS_CP15(c5, 0, c0, 0)
+#define IFSR   __ACCESS_CP15(c5, 0, c0, 1)
+#define ADFSR  __ACCESS_CP15(c5, 0, c1, 0)
+#define AIFSR  __ACCESS_CP15(c5, 0, c1, 1)
+#define HSR__ACCESS_CP15(c5, 4, c2, 0)
+#define DFAR   __ACCESS_CP15(c6, 0, c0, 0)
+#define IFAR   __ACCESS_CP15(c6, 0, c0, 2)
+#define HDFAR  __ACCESS_CP15(c6, 4, c0, 0)
+#define HIFAR  __ACCESS_CP15(c6, 4, c0, 2)
+#define HPFAR  __ACCESS_CP15(c6, 4, c0, 4)
+#define ICIALLUIS  __ACCESS_CP15(c7, 0, c1, 0)
+#define BPIALLIS   __ACCESS_CP15(c7, 0, c1, 6)
+#define ICIMVAU__ACCESS_CP15(c7, 0, c5, 1)
+#define ATS1CPR__ACCESS_CP15(c7, 0, c8, 0)
+#define TLBIALLIS  __ACCESS_CP15(c8, 0, c3, 0)
+#define TLBIALL__ACCESS_CP15(c8, 0, c7, 0)
+#define TLBIALLNSNHIS  __ACCESS_CP15(c8, 4, c3, 4)
+#define PRRR   __ACCESS_CP15(c10, 0, c2, 0)
+#define NMRR   __ACCESS_CP15(c10, 0, c2, 1)
+#define AMAIR0 __ACCESS_CP15(c10, 0, c3, 0)
+#define AMAIR1 __ACCESS_CP15(c10, 0, c3, 1)
+#define VBAR   __ACCESS_CP15(c12, 0, c0, 0)
+#define CID__ACCESS_CP15(c13, 0, c0, 1)
+#define TID_URW__ACCESS_CP15(c13, 0, c0, 2)
+#define TID_URO__ACCESS_CP15(c13, 0, c0, 3)
+#define TID_PRIV   __ACCESS_CP15(c13, 0, c0, 4)
+#define HTPIDR __ACCESS_CP15(c13, 4, c0, 2)
+#define CNTKCTL__ACCESS_CP15(c14, 0, c1, 0)
+#define CNTV_CTL   __ACCESS_CP15(c14, 0, c3, 1)
+#define CNTHCTL__ACCESS_CP15(c14, 4, c1, 0)
+
 extern unsigned long cr_alignment; /* defined in entry-armv.S */
 
+static inline void set_par(u64 val)
+{
+   if (IS_ENABLED(CONFIG_ARM_LPAE))
+   write_sysreg(val, PAR_64);
+   else
+   write_sysreg(val, PAR_32);
+}
+
+static inline u64 get_par(void)
+{
+   if (IS_ENABLED(CONFIG_ARM_LPAE))
+   return read_sysreg(PAR_64);
+   else
+   return read_sysreg(PAR_32);
+}
+
+static inline void set_ttbr0(u64 val)
+{
+   if (IS_ENABLED(CONFIG_ARM_LPAE))
+   write_sysreg(val, TTBR0_64);
+   else
+   write_sysreg(val, TTBR0_32);
+}
+
+static inline u64 get_ttbr0(void)
+{
+   if (IS_ENABLED(CONFIG_ARM_LPAE))
+   return read_sysreg(TTBR0_64);
+   else
+   return read_sysreg(TTBR0_32);
+}
+
+static inline void set_ttbr1(u64 val)
+{
+   if (IS_ENABLED(CONFIG_ARM_LPAE))
+   write_sysreg(val, TTBR1_64);
+   else
+   write_sysreg(val, TTBR1_32);
+}
+
+static inline u64 get_ttbr1(void)
+{
+   if (IS_ENAB

[PATCH 0/6] KASan for arm

2018-04-12 Thread Abbott Liu
From: Andrey Ryabinin 

Changelog:
v4 - v3
- Remove the fix of type conversion in kasan_cache_create because it has
  been fix in the latest version in:
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
- Change some Reviewed-by tag into Reported-by tag to avoid misleading.
  ---Reported by: Marc Zyngier 
  Russell King - ARM Linux 
- Disable instrumentation for arch/arm/mm/physaddr.c

v3 - v2
- Remove this patch: 2 1-byte checks more safer for memory_is_poisoned_16
  because a unaligned load/store of 16 bytes is rare on arm, and this
  patch is very likely to affect the performance of modern CPUs.
  ---Acked by: Russell King - ARM Linux 
- Fixed some link error which kasan_pmd_populate,kasan_pte_populate and
  kasan_pud_populate are in section .meminit.text but the function
  kasan_alloc_block which is called by kasan_pmd_populate,
  kasan_pte_populate and kasan_pud_populate is in section .init.text. So
  we need change kasan_pmd_populate,kasan_pte_populate and
  kasan_pud_populate into the section .init.text.
  ---Reported by: Florian Fainelli 
- Fixed some compile error which caused by the wrong access instruction in
  arch/arm/kernel/entry-common.S.
  ---Reported by: kbuild test robot 
- Disable instrumentation for arch/arm/kvm/hyp/*.
  ---Acked by: Marc Zyngier 
- Update the set of supported architectures in
  Documentation/dev-tools/kasan.rst.
  ---Acked by:Dmitry Vyukov 
- The version 2 is tested by:
  Florian Fainelli  (compile test)
  kbuild test robot(compile test)
  Joel Stanley(on ASPEED ast2500(ARMv5))

v2 - v1
- Fixed some compiling error which happens on changing kernel compression
  mode to lzma/xz/lzo/lz4.
  ---Reported by: Florian Fainelli ,
 Russell King - ARM Linux 
- Fixed a compiling error cause by some older arm instruction set(armv4t)
  don't suppory movw/movt which is reported by kbuild.
- Changed the pte flag from _L_PTE_DEFAULT | L_PTE_DIRTY | L_PTE_XN to
  pgprot_val(PAGE_KERNEL).
  ---Reported by: Russell King - ARM Linux 
- Moved Enable KASan patch as the last one.
  ---Reported by: Florian Fainelli ,
 Russell King - ARM Linux 
- Moved the definitions of cp15 registers from
  arch/arm/include/asm/kvm_hyp.h to arch/arm/include/asm/cp15.h.
  ---Asked by: Mark Rutland 
- Merge the following commits into the commit
  Define the virtual space of KASan's shadow region:
  1) Define the virtual space of KASan's shadow region;
  2) Avoid cleaning the KASan shadow area's mapping table;
  3) Add KASan layout;
- Merge the following commits into the commit
  Initialize the mapping of KASan shadow memory:
  1) Initialize the mapping of KASan shadow memory;
  2) Add support arm LPAE;
  3) Don't need to map the shadow of KASan's shadow memory;
 ---Reported by: Russell King - ARM Linux 
  4) Change mapping of kasan_zero_page int readonly.
- The version 1 is tested by Florian Fainelli 
  on a Cortex-A5 (no LPAE).

Hi,all:
   These patches add arch specific code for kernel address sanitizer
(see Documentation/kasan.txt).

   1/8 of kernel addresses reserved for shadow memory. There was no
big enough hole for this, so virtual addresses for shadow were
stolen from user space.

   At early boot stage the whole shadow region populated with just
one physical page (kasan_zero_page). Later, this page reused
as readonly zero shadow for some memory that KASan currently
don't track (vmalloc).

  After mapping the physical memory, pages for shadow memory are
allocated and mapped.

  KASan's stack instrumentation significantly increases stack's
consumption, so CONFIG_KASAN doubles THREAD_SIZE.

  Functions like memset/memmove/memcpy do a lot of memory accesses.
If bad pointer passed to one of these function it is important
to catch this. Compiler's instrumentation cannot do this since
these functions are written in assembly.

  KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions
in mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in name, so we could call non-instrumented variant
if needed.

  Some files built without kasan instrumentation (e.g. mm/slub.c).
Original mem* function replaced (via #define) with prefixed variants
to disable memory access checks for such files.

  On arm LPAE architecture,  the mapping table of KASan shadow memory(if
PAGE_OFFSET is 0xc000, the KASan shadow memory's virtual space is
0xb6e00~0xbf00) can't be filled in do_translation_fault function,
because kasan instrumentation maybe cause do_translation_fault function
accessing KASan shadow memory. The accessing of KASan shadow memory in
do_translation_fault function maybe cause dead circle. So the mapping table
of KASan shadow memory need be copyed in pgd_alloc function.


Most of the code comes from:
https://github.com/aryabinin/linux/commit/0b54f17e70ff50a902c4af05bb92716eb95acefe

These patches are tested on vexpress-ca15, ve

Re: [PATCH v11 2/4] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

2018-04-12 Thread gengdongjiu
Hi James,
  Thanks for the comments.

2018-04-10 22:15 GMT+08:00, James Morse :
> Hi Dongjiu Geng,
>
> On 09/04/18 22:36, Dongjiu Geng wrote:
>> This new IOCTL exports user-invisible states related to SError.
>> Together with appropriate user space changes, it can inject
>> SError with specified syndrome to guest by setup kvm_vcpu_events
>> value.
>
>> Also it can support live migration.
>
> Could you explain what user-space is expected to do for this?
> (this is also relevant for snapshot-ing/suspending VMs)
Ok.

>
> It's probably worth noting that this solves an existing problem: KVM may
> make an
> SError pending, but user-space has no way to discover/migrate this.

if KVM make an SError pending, when user-space do migration, it get the
kvm_vcpu_events through KVM_GET_VCPU_EVENTS, then can find that pending status.
What are the things you're worried about?

>
>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index 8a3d708..45719b4 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -819,11 +819,13 @@ struct kvm_clock_data {
>>
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (out)
>>  Returns: 0 on success, -1 on error
>>
>> +X86:
>> +
>>  Gets currently pending exceptions, interrupts, and NMIs as well as
>> related
>>  states of the vcpu.
>>
>> @@ -865,15 +867,31 @@ Only two fields are defined in the flags field:
>>  - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
>>smi contains a valid state.
>>
>> +ARM, ARM64:
>> +
>> +Gets currently pending SError exceptions as well as related states of the
>> vcpu.
>> +
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>> +/* Align it to 4 bytes */
>> +__u8 pad[2];
>> +__u64 serror_esr;
>> +} exception;
>> +};
>> +
>
> I'm not convinced we should change this struct from the layout/size x86 has.
> Its
> confusing for the documentation, is this API call really the same on all
> architectures?
>
> What if we want to add some future interrupt, NMI or related state? We've
> found
> ourselves needing to add this API, it seems odd to remove its other uses on
> x86.
> We can't put them back in the future.
>
> Having a different layout would force user-space to ifdef/duplicate any
> code
> that accesses this between architectures.
 In x86 and arm64 user space code, the handling logic of
KVM_GET/SET_VCPU_EVENTS is in different ARCH folder,  maybe it is not
necessary to share the handling code in the user space.

>
>
>
> The compiler will want that __u64 to be naturally aligned to 8-bytes, so
> your
> 4-byte padding still causes some secret compiler-padding to be inserted.
> Different versions of the compiler may put it in different places.
>
>
>>  4.32 KVM_SET_VCPU_EVENTS
>>
>>  Capability: KVM_CAP_VCPU_EVENTS
>>  Extended by: KVM_CAP_INTR_SHADOW
>> -Architectures: x86
>> +Architectures: x86, arm, arm64
>>  Type: vm ioctl
>>  Parameters: struct kvm_vcpu_event (in)
>>  Returns: 0 on success, -1 on error
>>
>> +X86:
>> +
>>  Set pending exceptions, interrupts, and NMIs as well as related states of
>> the
>>  vcpu.
>>
>> @@ -894,6 +912,12 @@ shall be written into the VCPU.
>>
>>  KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
>>
>> +ARM, ARM64:
>> +
>> +Set pending SError exceptions as well as related states of the vcpu.
>> +
>> +See KVM_GET_VCPU_EVENTS for the data structure.
>> +
>>
>>  4.33 KVM_GET_DEBUGREGS
>>
>
>
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h
>> b/arch/arm64/include/uapi/asm/kvm.h
>> index 9abbf30..855cc9a 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -39,6 +39,7 @@
>>  #define __KVM_HAVE_GUEST_DEBUG
>>  #define __KVM_HAVE_IRQ_LINE
>>  #define __KVM_HAVE_READONLY_MEM
>> +#define __KVM_HAVE_VCPU_EVENTS
>>
>>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>>
>> @@ -153,6 +154,17 @@ struct kvm_sync_regs {
>>  struct kvm_arch_memory_slot {
>>  };
>>
>> +/* for KVM_GET/SET_VCPU_EVENTS */
>> +struct kvm_vcpu_events {
>> +struct {
>> +__u8 serror_pending;
>> +__u8 serror_has_esr;
>
>> +/* Align it to 4 bytes */
>> +__u8 pad[2];
>
> (padding noted above)
>
>
>> +__u64 serror_esr;
>> +} exception;
>> +};
>> +
>>  /* If you need to interpret the index values, here is the key: */
>>  #define KVM_REG_ARM_COPROC_MASK 0x0FFF
>>  #define KVM_REG_ARM_COPROC_SHIFT16
>
>
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 5c7f657..42e1222 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -277,6 +277,37 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu
>> *vcpu,
>>  return -EINVAL;
>>  }
>>

Re: [PATCH v11 0/4] set VSESR_EL2 by user space and support NOTIFY_SEI notification

2018-04-12 Thread James Morse
Hi gengdongjiu,

On 12/04/18 07:09, gengdongjiu wrote:
> On 2018/4/10 22:15, James Morse wrote:
>> On 09/04/18 22:36, Dongjiu Geng wrote:
>>> 1. Detect whether KVM can set set guest SError syndrome
>>> 2. Support to Set VSESR_EL2 and inject SError by user space.
>>> 3. Support live migration to keep SError pending state and VSESR_EL2 value.
>>> 4. ACPI 6.1 adds support for NOTIFY_SEI as a GHES notification mechanism, 
>>> so support this
>>>notification in software, KVM or kernel ARCH code call 
>>> handle_guest_sei() to let ACP driver
>>>to handle this notification.
>>
>> Please don't post code during the merge-window, will this apply to 
>> v4.17-rc1? We
>> can't know until its tagged.

Posting code during the merge-window isn't helpful as the kernel is a moving
target, its better to wait for an 'rc' to base it on.

> I do not know when it is merge-window. About the apply version, it does not 
> have limited.

'git fetch' Linus' tree and look at the tags. 'v4.16' lost its '-rc' suffixes,
and there isn't a 'v4.17-rc1' yet, so we are still in the merge window.

Linus sends a message to LKML. eg:
https://lkml.org/lkml/2018/4/1/175

net-next closes shortly before the merge window, and re-opens afterwards. There
is a handy web page:
http://vger.kernel.org/~davem/net-next.html


>> This series is doing two separate things, please split it into two series.
> OK, thanks!
> 
>>
>> But on the ACPI front: I don't see how any OS can support your NOTIFY_SEI 
>> when
>> firmware is ignoring the normal world's PSTATE.A.
>>
>> The latest lobe of that discussion was on the list here:
>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html
> I have replied the mail.
> I still have some questions that need to clarify with you.
> After clarification, we will follow that.
> The question is in the reply of this mail 
> "https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1611496.html";

Lets keep that discussion on v9 then.


>> As it is, we would need to spot SError being delivered while SError is 
>> masked,
>> spray nasty messages about firmware being horrifically buggy, then panic(). 
>> For
>> a corrected error, this looks bad, but its preferable to letting firmware
>> silently overwrite the exception registers, causing linux to spin through the
>> vectors 'eret' with all exceptions masked.
>> I still think its best to wait for firmware that does the right thing.

> Let us  discuss that in another mail.
> In a summary, I think firmware follow below rule can be OK, right?
> 1. The exception came from the EL that SError should be routed to(according 
> to hcr_EL2.{AMO, TGE}),but PSTATE.A was set, EL3 firmware can't deliver 
> SError;

> 2. The exception came from the EL that SError should not be routed 
> to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3 
> firmware still deliver SError

Problem here, more on v9.


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-04-12 Thread James Morse
Hi gengdongjiu,

On 12/04/18 06:00, gengdongjiu wrote:
> 2018-02-16 1:55 GMT+08:00 James Morse :
>> On 05/02/18 11:24, gengdongjiu wrote:
 Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
 TGE}?
>>>
>>> Yes, it is.
>>
>> ... and yet ...
>>
>>
 What does your firmware do when it wants to emulate SError but its masked?
 (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
 PSTATE.A  set.
  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
 emulated  SError should go to EL1. This effectively masks SError.)
>>>
>>> Currently we does not consider much about the mask status(SPSR).
>>
>> .. this is a problem.
>>
>> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
>> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't 
>> eret to
>> EL2. This should never happen, SError is effectively masked if you are 
>> running
>> at an EL higher than the one its routed to.
>>
>> More obviously: if the exception came from the EL that SError should be 
>> routed
>> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the 
>> only

> James, I  summarized the masking and routing rules for SError to
> confirm with you for the firmware first solution,

You also said "Currently we does not consider much about the mask status(SPSR)."


> 1. If the HCR_EL2.{AMO,TGE} is set,

If one or the other of these bits is set: (AMO==1 || TGE==1)

> which means the SError should route to EL2,
> When system happens SError and trap to EL3,   If EL3 find
> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
> and find this SError come from EL2, it will not deliver an SError:
> store the RAS error in the BERT and 'reboot'; but if
> it find that this SError come from EL1 or EL0, it also need to deliver
> an SError, right?

Yes.


> 2. If the HCR_EL2.{AMO,TGE} is not set,

If neither of these bits is set: (AMO==0 && TGE == 0)

> which means the SError should route to EL1,
> When system happens SError and trap to EL3, If EL3 find
> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,

(I'm reading this as all three of these bits are clear)

> and find this SError come from EL1, it will not deliver an SError:
> store the RAS error in the BERT and 'reboot'; 

No, (AMO==0 && TGE == 0) means SError is routed to EL1, this exception
interrupted EL1 and the A bit was clear, so EL1 can take an SError.

The two cases here are:
AMO==0,TGE==0 means SError should be routed to EL1. If SPSR_EL3 says the
exception interrupted EL1 and the A bit was set, you need to do the BERT trick.

If SPSR_EL3 says the exception interrupted EL2, you need to do the BERT trick
regardless of the A bit, as SError is implicitly masked by running at a higher
exception level than it was routed to.


>From your v11 reply:
> 2. The exception came from the EL that SError should not be routed
> to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3
> firmware still deliver SError

(this is re-iterating the two-cases above:)
'not be routed to' is one of two things: Route-to-EL2+interruted-EL1, or
Route-to-EL1+interrupted-EL2.

Route-to-EL2+interrupted-EL1 is fine, regardless of SPSR_EL3.A the emulated
SError can be delivered to EL2, as EL2 can't mask SError when executing at a
lower EL.

Route-to-EL1+interrupted-EL2 is the problem. SError is implicitly masked by
running at a higher EL. Regardless of SPSR_EL3.A, the emulated SError can not be
delivered.
KVM does this on the way out of a guest, if an SError occurs during this time
the CPU will wait until execution returns to EL1 before delivering the SError.
Your firmware has to do the same.

Table D1-15 in "D1.14.2 Asynchronous exception masking" has a table with all the
combinations. The ARM-ARM is what we need to match with this behaviour.


> but if it find that this SError come from EL0, it also need to deliver an
> SError, right?

I thought interrupted-EL0 could always be delivered: but re-reading the
ARM-ARM's "D1.14.2 Asynchronous exception masking", if asynchronous exceptions
are routed to EL1 then EL0&EL1 are treated the same.
So if SError is routed to EL1, the exception interrupted EL0, and SPSR_EL3.A was
set, you still can't deliver the emulated-SError you have to do the BERT-trick.
Linux doesn't do this today, but another OS might (e.g. UEFI), and we might do
this in the future.

This is really tricky for firmware to get right. Another alternative would be to
put the CPER records in a Polled buffer, unless something needs doing right now,
in which case a BERT-reboot is probably best.


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers

2018-04-12 Thread Jae Hyun Yoo

On 4/11/2018 8:40 PM, Guenter Roeck wrote:

On 04/11/2018 07:51 PM, Jae Hyun Yoo wrote:

On 4/11/2018 5:34 PM, Guenter Roeck wrote:

On 04/11/2018 02:59 PM, Jae Hyun Yoo wrote:

Hi Guenter,

Thanks a lot for sharing your time. Please see my inline answers.

On 4/10/2018 3:28 PM, Guenter Roeck wrote:

On Tue, Apr 10, 2018 at 11:32:11AM -0700, Jae Hyun Yoo wrote:

This commit adds PECI cputemp and dimmtemp hwmon drivers.

Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Cc: Alan Cox 
Cc: Andrew Jeffery 
Cc: Andrew Lunn 
Cc: Andy Shevchenko 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Fengguang Wu 
Cc: Greg KH 
Cc: Guenter Roeck 
Cc: Jason M Biils 
Cc: Jean Delvare 
Cc: Joel Stanley 
Cc: Julia Cartwright 
Cc: Miguel Ojeda 
Cc: Milton Miller II 
Cc: Pavel Machek 
Cc: Randy Dunlap 
Cc: Stef van Os 
Cc: Sumeet R Pawnikar 
---
  drivers/hwmon/Kconfig |  28 ++
  drivers/hwmon/Makefile    |   2 +
  drivers/hwmon/peci-cputemp.c  | 783 
++

  drivers/hwmon/peci-dimmtemp.c | 432 +++
  4 files changed, 1245 insertions(+)
  create mode 100644 drivers/hwmon/peci-cputemp.c
  create mode 100644 drivers/hwmon/peci-dimmtemp.c

diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index f249a4428458..c52f610f81d0 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1259,6 +1259,34 @@ config SENSORS_NCT7904
    This driver can also be built as a module.  If so, the module
    will be called nct7904.
+config SENSORS_PECI_CPUTEMP
+    tristate "PECI CPU temperature monitoring support"
+    depends on OF
+    depends on PECI
+    help
+  If you say yes here you get support for the generic Intel PECI
+  cputemp driver which provides Digital Thermal Sensor (DTS) 
thermal
+  readings of the CPU package and CPU cores that are 
accessible using

+  the PECI Client Command Suite via the processor PECI client.
+  Check Documentation/hwmon/peci-cputemp for details.
+
+  This driver can also be built as a module.  If so, the module
+  will be called peci-cputemp.
+
+config SENSORS_PECI_DIMMTEMP
+    tristate "PECI DIMM temperature monitoring support"
+    depends on OF
+    depends on PECI
+    help
+  If you say yes here you get support for the generic Intel 
PECI hwmon
+  driver which provides Digital Thermal Sensor (DTS) thermal 
readings of
+  DIMM components that are accessible using the PECI Client 
Command

+  Suite via the processor PECI client.
+  Check Documentation/hwmon/peci-dimmtemp for details.
+
+  This driver can also be built as a module.  If so, the module
+  will be called peci-dimmtemp.
+
  config SENSORS_NSA320
  tristate "ZyXEL NSA320 and compatible fan speed and 
temperature sensors"

  depends on GPIOLIB && OF
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index e7d52a36e6c4..48d9598fcd3a 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -136,6 +136,8 @@ obj-$(CONFIG_SENSORS_NCT7802)    += nct7802.o
  obj-$(CONFIG_SENSORS_NCT7904)    += nct7904.o
  obj-$(CONFIG_SENSORS_NSA320)    += nsa320-hwmon.o
  obj-$(CONFIG_SENSORS_NTC_THERMISTOR)    += ntc_thermistor.o
+obj-$(CONFIG_SENSORS_PECI_CPUTEMP)    += peci-cputemp.o
+obj-$(CONFIG_SENSORS_PECI_DIMMTEMP)    += peci-dimmtemp.o
  obj-$(CONFIG_SENSORS_PC87360)    += pc87360.o
  obj-$(CONFIG_SENSORS_PC87427)    += pc87427.o
  obj-$(CONFIG_SENSORS_PCF8591)    += pcf8591.o
diff --git a/drivers/hwmon/peci-cputemp.c 
b/drivers/hwmon/peci-cputemp.c

new file mode 100644
index ..f0bc92687512
--- /dev/null
+++ b/drivers/hwmon/peci-cputemp.c
@@ -0,0 +1,783 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Intel Corporation
+
+#include 
+#include 
+#include 


Is this include needed ?



No it isn't. Will drop the line.


+#include 
+#include 
+#include 
+#include 
+
+#define TEMP_TYPE_PECI    6  /* Sensor type 6: Intel PECI */
+
+#define CORE_MAX_ON_HSX   18 /* Max number of cores on 
Haswell */
+#define CORE_MAX_ON_BDX   24 /* Max number of cores on 
Broadwell */
+#define CORE_MAX_ON_SKX   28 /* Max number of cores on 
Skylake */

+
+#define DEFAULT_CHANNEL_NUMS  5
+#define CORETEMP_CHANNEL_NUMS CORE_MAX_ON_SKX
+#define CPUTEMP_CHANNEL_NUMS  (DEFAULT_CHANNEL_NUMS + 
CORETEMP_CHANNEL_NUMS)

+
+#define CLIENT_CPU_ID_MASK    0xf0ff0  /* Mask for Family / Model 
info */

+
+#define UPDATE_INTERVAL_MIN   HZ
+
+enum cpu_gens {
+    CPU_GEN_HSX, /* Haswell Xeon */
+    CPU_GEN_BRX, /* Broadwell Xeon */
+    CPU_GEN_SKX, /* Skylake Xeon */
+    CPU_GEN_MAX
+};
+
+struct cpu_gen_info {
+    u32 type;
+    u32 cpu_id;
+    u32 core_max;
+};
+
+struct temp_data {
+    bool valid;
+    s32  value;
+    unsigned long last_updated;
+};
+
+struct temp_group {
+    struct temp_data die;
+    struct temp_data dts_margin;
+    struct temp_data tcontrol;
+    struct temp_data tthrottle;
+    struct t

Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers

2018-04-12 Thread Guenter Roeck
On Thu, Apr 12, 2018 at 10:09:51AM -0700, Jae Hyun Yoo wrote:
[ ... ]
> >>+static int find_core_index(struct peci_cputemp *priv, int channel)
> >>+{
> >>+    int core_channel = channel - DEFAULT_CHANNEL_NUMS;
> >>+    int idx, found = 0;
> >>+
> >>+    for (idx = 0; idx < priv->gen_info->core_max; idx++) {
> >>+    if (priv->core_mask & BIT(idx)) {
> >>+    if (core_channel == found)
> >>+    break;
> >>+
> >>+    found++;
> >>+    }
> >>+    }
> >>+
> >>+    return idx;
> >
> >What if nothing is found ?
> >
> 
> Core temperature group will be registered only when it detects at
> least one core checked by check_resolved_cores(), so
> find_core_index() can be called only when priv->core_mask has a
> non-zero value. The 'nothing is found' case will not happen.
> 
> >>>That doesn't guarantee a match. If what you are saying is correct
> >>>there should always be
> >>>a well defined match of channel -> idx, and the search should be
> >>>unnecessary.
> >>>
> >>
> >>There could be some disabled cores in the resolved core mask bit
> >>sequence also it should remove indexing gap in channel numbering so it
> >>is the reason why this search function is needed. Well defined match of
> >>channel -> idx would not be always satisfied.
> >>
> >Are you saying that each call to the function, with the same parameters,
> >can return a different result ?
> >
> 
> No, the result will be consistent. After reading the priv->core_mask once in
> check_resolved_cores(), the value will not be changed. I'm saying about this
> case, for example if core number 2 is unresolved in total 4 cores, then the
> idx order will be '0, 1, 3' but channel order will be '5, 6, 7' without
> making any indexing gap.
> 

And you yet you claim that this is not well defined ? Or are you concerned
about the amount of memory consumed by providing an array for the mapping ?

Note that an indexing gap is acceptable and, in many cases, preferred.

[ ... ]

> >>+
> >>+    dev_dbg(dev, "%s: sensor '%s'\n", dev_name(hwmon_dev),
> >>priv->name);
> >>+
> >>>
> >>>Why does this message display the device name twice ?
> >>>
> >>
> >>For an example, dev_name(hwmon_dev) shows 'hwmon5' and priv->name shows
> >>'peci-cputemp0'.
> >>
> >And dev_dbg() shows another device name. So you'll have something like
> >
> >peci-cputemp0: hwmon5: sensor 'peci-cputemp0'
> >
> 
> Practically it shows like
> 
> peci-cputemp 0-30:00: hwmon10: sensor 'peci_cputemp.cpu0'
> 
> where 0-30:00 is assigned by peci core.
> 

And what message would you see for cpu1 ?

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sched/fair: add support to tune PELT ramp/decay timings

2018-04-12 Thread Peter Zijlstra
On Mon, Apr 09, 2018 at 05:51:34PM +0100, Patrick Bellasi wrote:
> The PELT half-life is the time [ms] required by the PELT signal to build
> up a 50% load/utilization, starting from zero. This time is currently
> hardcoded to be 32ms, a value which seems to make sense for most of the
> workloads.
> 
> However, 32ms has been verified to be too long for certain classes of
> workloads. For example, in the mobile space many tasks affecting the
> user-experience run with a 16ms or 8ms cadence, since they need to match
> the common 60Hz or 120Hz refresh rate of the graphics pipeline.
> This contributed so fare to the idea that "PELT is too slow" to properly
> track the utilization of interactive mobile workloads, especially
> compared to alternative load tracking solutions which provides a
> better representation of tasks demand in the range of 10-20ms.

Initially the 32 was chosen to more or less correspond to the effective
scheduling period (sysctl_sched_latency based). The thinking was that if
you pick a PELT window shorter than the period, the result becomes
unstable due to not all tasks getting an equal go at things.

(of course, stuffing enough tasks on a rq will break this, but at that
point you have worse problems to deal with)

Should we retain this? Esp. with the lower end (8ms) I worry we'll see
more of those effects.


> Fortunately, since the integration of the utilization estimation
> support in mainline kernel:
> 
>commit 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
> 
> a fast decay time is no longer an issue for tasks utilization estimation.
> Although estimated utilization does not slow down the decay of blocked
> utilization on idle CPUs, for mobile workloads this seems not to be a
> major concern compared to the benefits in interactivity responsiveness.

By picking a smaller PELT window, the util_est window shrinks
correspondingly; is that intentional or do we want to modify
UTIL_EST_WEIGHT_SHIFT to negate the PELT window changes?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 09/10] drivers/hwmon: Add PECI hwmon client drivers

2018-04-12 Thread Jae Hyun Yoo

On 4/12/2018 10:37 AM, Guenter Roeck wrote:

On Thu, Apr 12, 2018 at 10:09:51AM -0700, Jae Hyun Yoo wrote:
[ ... ]

+static int find_core_index(struct peci_cputemp *priv, int channel)
+{
+    int core_channel = channel - DEFAULT_CHANNEL_NUMS;
+    int idx, found = 0;
+
+    for (idx = 0; idx < priv->gen_info->core_max; idx++) {
+    if (priv->core_mask & BIT(idx)) {
+    if (core_channel == found)
+    break;
+
+    found++;
+    }
+    }
+
+    return idx;


What if nothing is found ?



Core temperature group will be registered only when it detects at
least one core checked by check_resolved_cores(), so
find_core_index() can be called only when priv->core_mask has a
non-zero value. The 'nothing is found' case will not happen.


That doesn't guarantee a match. If what you are saying is correct
there should always be
a well defined match of channel -> idx, and the search should be
unnecessary.



There could be some disabled cores in the resolved core mask bit
sequence also it should remove indexing gap in channel numbering so it
is the reason why this search function is needed. Well defined match of
channel -> idx would not be always satisfied.


Are you saying that each call to the function, with the same parameters,
can return a different result ?



No, the result will be consistent. After reading the priv->core_mask once in
check_resolved_cores(), the value will not be changed. I'm saying about this
case, for example if core number 2 is unresolved in total 4 cores, then the
idx order will be '0, 1, 3' but channel order will be '5, 6, 7' without
making any indexing gap.



And you yet you claim that this is not well defined ? Or are you concerned
about the amount of memory consumed by providing an array for the mapping ?

Note that an indexing gap is acceptable and, in many cases, preferred.



If the indexing gap is acceptable, the index search function isn't 
needed anymore. I'll fix all relating code to make that use direct 
mapping of channel -> idx then. Thanks!



[ ... ]


+
+    dev_dbg(dev, "%s: sensor '%s'\n", dev_name(hwmon_dev),
priv->name);
+


Why does this message display the device name twice ?



For an example, dev_name(hwmon_dev) shows 'hwmon5' and priv->name shows
'peci-cputemp0'.


And dev_dbg() shows another device name. So you'll have something like

peci-cputemp0: hwmon5: sensor 'peci-cputemp0'



Practically it shows like

peci-cputemp 0-30:00: hwmon10: sensor 'peci_cputemp.cpu0'

where 0-30:00 is assigned by peci core.



And what message would you see for cpu1 ?



It shows like

peci-cputemp 0-31:00: hwmon10: sensor 'peci_cputemp.cpu1'
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] gpiolib: add hogs support for machine code

2018-04-12 Thread Christian Lamparter
On Dienstag, 10. April 2018 22:30:28 CEST Bartosz Golaszewski wrote:
> Board files constitute a significant part of the users of the legacy
> GPIO framework. In many cases they only export a line and set its
> desired value. We could use GPIO hogs for that like we do for DT and
> ACPI but there's no support for that in machine code.
> 
> This patch proposes to extend the machine.h API with support for
> registering hog tables in board files.
> 
> Signed-off-by: Bartosz Golaszewski 
> ---
> @@ -1326,6 +1364,8 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
> void *data,
>  
>   acpi_gpiochip_add(chip);
>  
> + machine_gpiochip_add(chip);
> +
>   /*
>* By first adding the chardev, and then adding the device,
>* we get a device node entry in sysfs under
> @@ -3462,6 +3502,33 @@ void gpiod_remove_lookup_table(struct 
> gpiod_lookup_table *table)

I think I see the same problem right here in regards to pinctrls
and gpiohogs that have with DeviceTree:


The problem is that unlike native gpio-controllers, pinctrls need 
to have a "pin/gpio range" defined before any gpio-hogs can be added.

If this is not the case the generic pinctrl_gpio_reguest() [0] will
fail with -EPROBE_DEFER at this point. (see the call chain in the
"pinctrl: msm: fix gpio-hog related boot issueslogin register" mail
starting from gpiod_hog).

And now the crux of the matter is that currently in order for pinctrl
drivers to register the range they have to call gpiochip_add_pin_range() [1]. 
But they only can do it after the gpiochip_add_data_with_key() [2], since
this function initializes the pin_ranges list [3].

So what will happen is that you'll get an
"gpiochip_machine_hog: unable to hog GPIO line $LABEL $GPIONR -517" error
for every single gpio-hog and wonder why :(.

Regards,
Christian

[0] 

[1] 

[2] 

[3] 




--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] mm: remove odd HAVE_PTE_SPECIAL

2018-04-12 Thread David Rientjes
On Thu, 12 Apr 2018, Laurent Dufour wrote:

> Remove the additional define HAVE_PTE_SPECIAL and rely directly on
> CONFIG_ARCH_HAS_PTE_SPECIAL.
> 
> There is no functional change introduced by this patch
> 
> Signed-off-by: Laurent Dufour 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation/i2c: sync docs with current state of i2c-tools.

2018-04-12 Thread Sam Hansen
Currently, Documentation/i2c/dev-interface describes the use of i2c_smbus_*
helper routines as static inlined functions provided by linux/i2c-dev.h.  Work
has been done to refactor the linux/i2c-dev.h file in the i2c-tools project
out into its own library.  As a result, these docs have become stale.

This patch corrects the discrepancy and directs the reader to the i2c-tools
project for more information.  Additionally, some trailing-whitespace cleanups
were made.

Signed-off-by: Sam Hansen 
---
 Documentation/i2c/dev-interface | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/Documentation/i2c/dev-interface b/Documentation/i2c/dev-interface
index d04e6e4964ee..5323588fe99d 100644
--- a/Documentation/i2c/dev-interface
+++ b/Documentation/i2c/dev-interface
@@ -9,8 +9,8 @@ i2c adapters present on your system at a given time. i2cdetect 
is part of
 the i2c-tools package.
 
 I2C device files are character device files with major device number 89
-and a minor device number corresponding to the number assigned as 
-explained above. They should be called "i2c-%d" (i2c-0, i2c-1, ..., 
+and a minor device number corresponding to the number assigned as
+explained above. They should be called "i2c-%d" (i2c-0, i2c-1, ...,
 i2c-10, ...). All 256 minor device numbers are reserved for i2c.
 
 
@@ -23,11 +23,6 @@ First, you need to include these two headers:
   #include 
   #include 
 
-(Please note that there are two files named "i2c-dev.h" out there. One is
-distributed with the Linux kernel and the other one is included in the
-source tree of i2c-tools. They used to be different in content but since 2012
-they're identical. You should use "linux/i2c-dev.h").
-
 Now, you have to decide which adapter you want to access. You should
 inspect /sys/class/i2c-dev/ or run "i2cdetect -l" to decide this.
 Adapter numbers are assigned somewhat dynamically, so you can not
@@ -38,7 +33,7 @@ Next thing, open the device file, as follows:
   int file;
   int adapter_nr = 2; /* probably dynamically determined */
   char filename[20];
-  
+
   snprintf(filename, 19, "/dev/i2c-%d", adapter_nr);
   file = open(filename, O_RDWR);
   if (file < 0) {
@@ -72,7 +67,7 @@ the device supports them. Both are illustrated below.
 /* res contains the read word */
   }
 
-  /* Using I2C Write, equivalent of 
+  /* Using I2C Write, equivalent of
  i2c_smbus_write_word_data(file, reg, 0x6543) */
   buf[0] = reg;
   buf[1] = 0x43;
@@ -140,14 +135,14 @@ ioctl(file, I2C_RDWR, struct i2c_rdwr_ioctl_data *msgset)
   set in each message, overriding the values set with the above ioctl's.
 
 ioctl(file, I2C_SMBUS, struct i2c_smbus_ioctl_data *args)
-  Not meant to be called  directly; instead, use the access functions
-  below.
+  If possible, use the provided i2c_smbus_* methods described below in favor
+  of issuing direct ioctls.
 
 You can do plain i2c transactions by using read(2) and write(2) calls.
 You do not need to pass the address byte; instead, set it through
 ioctl I2C_SLAVE before you try to access the device.
 
-You can do SMBus level transactions (see documentation file smbus-protocol 
+You can do SMBus level transactions (see documentation file smbus-protocol
 for details) through the following functions:
   __s32 i2c_smbus_write_quick(int file, __u8 value);
   __s32 i2c_smbus_read_byte(int file);
@@ -158,7 +153,7 @@ for details) through the following functions:
   __s32 i2c_smbus_write_word_data(int file, __u8 command, __u16 value);
   __s32 i2c_smbus_process_call(int file, __u8 command, __u16 value);
   __s32 i2c_smbus_read_block_data(int file, __u8 command, __u8 *values);
-  __s32 i2c_smbus_write_block_data(int file, __u8 command, __u8 length, 
+  __s32 i2c_smbus_write_block_data(int file, __u8 command, __u8 length,
__u8 *values);
 All these transactions return -1 on failure; you can read errno to see
 what happened. The 'write' transactions return 0 on success; the
@@ -166,10 +161,9 @@ what happened. The 'write' transactions return 0 on 
success; the
 returns the number of values read. The block buffers need not be longer
 than 32 bytes.
 
-The above functions are all inline functions, that resolve to calls to
-the i2c_smbus_access function, that on its turn calls a specific ioctl
-with the data in a specific format. Read the source code if you
-want to know what happens behind the screens.
+The above functions are made available by linking against the libi2c library,
+which is provided by the i2c-tools project.  See:
+https://git.kernel.org/pub/scm/utils/i2c-tools/i2c-tools.git/.
 
 
 Implementation details
-- 
2.17.0.484.g0c8726318c-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation/i2c: sync docs with current state of i2c-tools.

2018-04-12 Thread Wolfram Sang
Hi,

On Thu, Apr 12, 2018 at 02:33:42PM -0700, Sam Hansen wrote:
> Currently, Documentation/i2c/dev-interface describes the use of i2c_smbus_*
> helper routines as static inlined functions provided by linux/i2c-dev.h.  Work
> has been done to refactor the linux/i2c-dev.h file in the i2c-tools project
> out into its own library.  As a result, these docs have become stale.

Thanks for fixing this!

> This patch corrects the discrepancy and directs the reader to the i2c-tools
> project for more information.  Additionally, some trailing-whitespace cleanups
> were made.

Minor nit: Having the whitespace changes in a seperate patch is a tad
easier to review.

> -  /* Using I2C Write, equivalent of 
> +  /* Using I2C Write, equivalent of
>   i2c_smbus_write_word_data(file, reg, 0x6543) */

Maybe change to Kernel coding style comments while here?

> -  Not meant to be called  directly; instead, use the access functions
> -  below.
> +  If possible, use the provided i2c_smbus_* methods described below in favor
> +  of issuing direct ioctls.

Why this change?

> -The above functions are all inline functions, that resolve to calls to
> -the i2c_smbus_access function, that on its turn calls a specific ioctl
> -with the data in a specific format. Read the source code if you
> -want to know what happens behind the screens.
> +The above functions are made available by linking against the libi2c library,
> +which is provided by the i2c-tools project.  See:
> +https://git.kernel.org/pub/scm/utils/i2c-tools/i2c-tools.git/.

This is fine with me. Maybe Jean has a comment on this?

Kind regards,

   Wolfram



signature.asc
Description: PGP signature


Re: [RFC bpf-next v2 4/8] bpf: add documentation for eBPF helpers (23-32)

2018-04-12 Thread Alexei Starovoitov
On Tue, Apr 10, 2018 at 03:41:53PM +0100, Quentin Monnet wrote:
> Add documentation for eBPF helper functions to bpf.h user header file.
> This documentation can be parsed with the Python script provided in
> another commit of the patch series, in order to provide a RST document
> that can later be converted into a man page.
> 
> The objective is to make the documentation easily understandable and
> accessible to all eBPF developers, including beginners.
> 
> This patch contains descriptions for the following helper functions, all
> written by Daniel:
> 
> - bpf_get_prandom_u32()
> - bpf_get_smp_processor_id()
> - bpf_get_cgroup_classid()
> - bpf_get_route_realm()
> - bpf_skb_load_bytes()
> - bpf_csum_diff()
> - bpf_skb_get_tunnel_opt()
> - bpf_skb_set_tunnel_opt()
> - bpf_skb_change_proto()
> - bpf_skb_change_type()
> 
> Cc: Daniel Borkmann 
> Signed-off-by: Quentin Monnet 
> ---
>  include/uapi/linux/bpf.h | 125 
> +++
>  1 file changed, 125 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index f3ea8824efbc..d147d9dd6a83 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -473,6 +473,14 @@ union bpf_attr {
>   *   The number of bytes written to the buffer, or a negative error
>   *   in case of failure.
>   *
> + * u32 bpf_prandom_u32(void)
> + *   Return
> + *   A random 32-bit unsigned value.

there is no such helper.
It's called bpf_get_prandom_u32().
I'd also add a note that it's using its own random state and cannot be
used to infer seed of other random functions in the kernel.

> + *
> + * u32 bpf_get_smp_processor_id(void)
> + *   Return
> + *   The SMP (Symmetric multiprocessing) processor id.

probably worth adding a note to explain that all bpf programs run
with preemption disabled, so processor id is stable for the run of the program.

> + *
>   * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void 
> *from, u32 len, u64 flags)
>   *   Description
>   *   Store *len* bytes from address *from* into the packet
> @@ -604,6 +612,13 @@ union bpf_attr {
>   *   Return
>   *   0 on success, or a negative error in case of failure.
>   *
> + * u32 bpf_get_cgroup_classid(struct sk_buff *skb)
> + *   Description
> + *   Retrieve the classid for the current task, i.e. for the
> + *   net_cls (network classifier) cgroup to which *skb* belongs.

please add that kernel should be configured with CONFIG_NET_CLS_CGROUP=y|m
and mention Documentation/cgroup-v1/net_cls.txt
Otherwise 'network classifier' is way too generic.
I'd also mention that placing a task into net_cls controller
disables all of cgroup-bpf.

> + *   Return
> + *   The classid, or 0 for the default unconfigured classid.
> + *
>   * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 
> vlan_tci)
>   *   Description
>   *   Push a *vlan_tci* (VLAN tag control information) of protocol
> @@ -703,6 +718,14 @@ union bpf_attr {
>   *   are **TC_ACT_REDIRECT** on success or **TC_ACT_SHOT** on
>   *   error.
>   *
> + * u32 bpf_get_route_realm(struct sk_buff *skb)
> + *   Description
> + *   Retrieve the realm or the route, that is to say the
> + *   **tclassid** field of the destination for the *skb*.

Similarly this only works if CONFIG_IP_ROUTE_CLASSID is on.

> + *   Return
> + *   The realm of the route for the packet associated to *sdb*, or 0
> + *   if none was found.
> + *
>   * int bpf_perf_event_output(struct pt_reg *ctx, struct bpf_map *map, u64 
> flags, void *data, u64 size)
>   *   Description
>   *   Write perf raw sample into a perf event held by *map* of type
> @@ -779,6 +802,21 @@ union bpf_attr {
>   *   Return
>   *   0 on success, or a negative error in case of failure.
>   *
> + * int bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, 
> u32 len)
> + *   Description
> + *   This helper was provided as an easy way to load data from a
> + *   packet. It can be used to load *len* bytes from *offset* from
> + *   the packet associated to *skb*, into the buffer pointed by
> + *   *to*.
> + *
> + *   Since Linux 4.7, this helper is deprecated in favor of
> + *   "direct packet access", enabling packet data to be manipulated
> + *   with *skb*\ **->data** and *skb*\ **->data_end** pointing
> + *   respectively to the first byte of packet data and to the byte
> + *   after the last byte of packet data.

I wouldn't call it deprecated.
It's still useful when programmer wants to read large quantities of
data from the packet

> + *   Return
> + *   0 on success, or a negative error in case of failure.
> + *
>   * int bpf_get_stackid(struct pt_reg *ctx, struct bpf_map *map, u64 flags)
>   *   Description
>   *   Walk a user or a kernel stack and return its id. To a