Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Satheesh Rajendran
On Mon, Jun 10, 2019 at 03:49:48PM +1000, Nicholas Piggin wrote:
> Nicholas Piggin's on June 10, 2019 2:38 pm:
> > +static int vmap_hpages_range(unsigned long start, unsigned long end,
> > +  pgprot_t prot, struct page **pages,
> > +  unsigned int page_shift)
> > +{
> > +   BUG_ON(page_shift != PAGE_SIZE);
> > +   return vmap_pages_range(start, end, prot, pages);
> > +}
> 
> That's a false positive BUG_ON for !HUGE_VMAP configs. I'll fix that
> and repost after a round of feedback.

Sure, Crash log for that false positive BUG_ON on Power8 Host.

[0.001718] pid_max: default: 163840 minimum: 1280
[0.010437] [ cut here ]
[0.010461] kernel BUG at mm/vmalloc.c:473!
[0.010471] Oops: Exception in kernel mode, sig: 5 [#1]
[0.010481] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[0.010491] Modules linked in:
[0.010503] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc3-ga7ee9421d #1
[0.010515] NIP:  c034dbd8 LR: c034dc80 CTR: 
[0.010527] REGS: c15bf9a0 TRAP: 0700   Not tainted  
(5.2.0-rc3-ga7ee9421d)
[0.010537] MSR:  92029033   CR: 
22022422  XER: 2000
[0.010559] CFAR: c034dc88 IRQMASK: 0
[0.010559] GPR00: c034dc80 c15bfc30 c15c2f00 
c00c01fd0e00
[0.010559] GPR04:  2322  
0040
[0.010559] GPR08: c00ff908 0400 0400 
0100
[0.010559] GPR12: 42022422 c17a 0001035ae7d8 
0400
[0.010559] GPR16: 0400 818e c0ee08c8 

[0.010559] GPR20: 0001 2b22 0b20 
0022
[0.010559] GPR24: c007f92c7880 0b22 0001 
c00a
[0.010559] GPR28: c008 0400  
0b20
[0.010664] NIP [c034dbd8] __vmalloc_node_range+0x1f8/0x410
[0.010677] LR [c034dc80] __vmalloc_node_range+0x2a0/0x410
[0.010686] Call Trace:
[0.010695] [c15bfc30] [c034dc80] 
__vmalloc_node_range+0x2a0/0x410 (unreliable)
[0.010711] [c15bfd30] [c034de40] __vmalloc+0x50/0x60
[0.010724] [c15bfda0] [c101e54c] 
alloc_large_system_hash+0x200/0x304
[0.010738] [c15bfe60] [c10235bc] vfs_caches_init+0xd8/0x138
[0.010752] [c15bfee0] [c0fe428c] start_kernel+0x5c4/0x668
[0.010767] [c15bff90] [c000b774] 
start_here_common+0x1c/0x528
[0.010777] Instruction dump:
[0.010785] 6000 7c691b79 418200dc e9180020 79ea1f24 7d28512a 40920170 
8138002c
[0.010803] 394f0001 794f0020 7c095040 4181ffbc <0fe0> 6000 3f41 
4bfffedc
[0.010826] ---[ end trace dd0217488686d653 ]---
[0.010834]
[1.010946] Kernel panic - not syncing: Attempted to kill the idle task!
[1.011061] Rebooting in 10 seconds..

Regards,
-Satheesh.
> 
> Thanks,
> Nick
> 
> 



Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-10 Thread Christoph Hellwig
On Sat, Jun 08, 2019 at 04:52:24PM -0500, Larry Finger wrote:
> On 6/7/19 12:29 PM, Christoph Hellwig wrote:
>> I don't think we should work around this in the driver, we need to fix
>> it in the core.  I'm curious why my previous patch didn't work.  Can
>> you throw in a few printks what failed?  I.e. did dma_direct_supported
>> return false?  Did the actual allocation fail?
>
> Routine dma_direct_supported() returns true.
>
> The failure is in routine dma_set_mask() in the following if test:
>
> if (!dev->dma_mask || !dma_supported(dev, mask))
> return -EIO;
>
> For b43legacy, dev->dma_mask is 0xc2656848.
> dma_supported(dev, mask) is 0xc08b, mask is 0x3fff, and 
> the routine returns -EIO.
>
> For b43,   dev->dma_mask is 0xc26568480001,
> dma_supported(dev, mask) is 0xc08b, mask is 0x, and 
> the routine returns 0.

I don't fully understand what values the above map to.  Can you send
me your actual debugging patch as well?


Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Anshuman Khandual
On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
> allocate huge pages and map them.

IIUC that extends HAVE_ARCH_HUGE_VMAP from iormap to vmalloc. 

> 
> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
> (performance is in the noise, under 1% difference, page tables are likely
> to be well cached for this workload). Similar numbers are seen on POWER9.

Sure will try this on arm64.

> 
> Signed-off-by: Nicholas Piggin 
> ---
>  include/asm-generic/4level-fixup.h |   1 +
>  include/asm-generic/5level-fixup.h |   1 +
>  include/linux/vmalloc.h|   1 +
>  mm/vmalloc.c   | 132 +++--
>  4 files changed, 107 insertions(+), 28 deletions(-)
> 
> diff --git a/include/asm-generic/4level-fixup.h 
> b/include/asm-generic/4level-fixup.h
> index e3667c9a33a5..3cc65a4dd093 100644
> --- a/include/asm-generic/4level-fixup.h
> +++ b/include/asm-generic/4level-fixup.h
> @@ -20,6 +20,7 @@
>  #define pud_none(pud)0
>  #define pud_bad(pud) 0
>  #define pud_present(pud) 1
> +#define pud_large(pud)   0
>  #define pud_ERROR(pud)   do { } while (0)
>  #define pud_clear(pud)   pgd_clear(pud)
>  #define pud_val(pud) pgd_val(pud)
> diff --git a/include/asm-generic/5level-fixup.h 
> b/include/asm-generic/5level-fixup.h
> index bb6cb347018c..c4377db09a4f 100644
> --- a/include/asm-generic/5level-fixup.h
> +++ b/include/asm-generic/5level-fixup.h
> @@ -22,6 +22,7 @@
>  #define p4d_none(p4d)0
>  #define p4d_bad(p4d) 0
>  #define p4d_present(p4d) 1
> +#define p4d_large(p4d)   0
>  #define p4d_ERROR(p4d)   do { } while (0)
>  #define p4d_clear(p4d)   pgd_clear(p4d)
>  #define p4d_val(p4d) pgd_val(p4d)

Both of these are required from vmalloc_to_page() which as per a later comment
should be part of a prerequisite patch before this series.

> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 812bea5866d6..4c92dc608928 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -42,6 +42,7 @@ struct vm_struct {
>   unsigned long   size;
>   unsigned long   flags;
>   struct page **pages;
> + unsigned intpage_shift;

So the entire vm_struct will be mapped with a single page_shift. It cannot have
mix and match mappings with PAGE_SIZE, PMD_SIZE, PUD_SIZE etc in case the
allocation fails for larger ones, falling back etc what over other reasons.

>   unsigned intnr_pages;
>   phys_addr_t phys_addr;
>   const void  *caller;
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dd27cfb29b10..0cf8e861caeb 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -36,6 +36,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, 
> unsigned long end,
>   return ret;
>  }
>  
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +static int vmap_hpages_range(unsigned long start, unsigned long end,

A small nit (if you agree) s/hpages/huge_pages/

> +pgprot_t prot, struct page **pages,

Re-order (prot <---> pages) just to follow the standard like before.

> +unsigned int page_shift)
> +{
> + unsigned long addr = start;
> + unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);

s/nr/nr_huge_pages ?

Also should not we check for the alignment of the range [start...end] with
respect to (1UL << [PAGE_SHIFT + page_shift]).


> +
> + for (i = 0; i < nr; i++) {
> + int err;
> +
> + err = vmap_range_noflush(addr,
> + addr + (PAGE_SIZE << page_shift),
> + __pa(page_address(pages[i])), prot,
> + PAGE_SHIFT + page_shift);
> + if (err)
> + return err;
> +
> + addr += PAGE_SIZE << page_shift;
> + }
> + flush_cache_vmap(start, end);
> +
> + return nr;
> +}
> +#else
> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> +pgprot_t prot, struct page **pages,
> +unsigned int page_shift)
> +{
> + BUG_ON(page_shift != PAGE_SIZE);
> + return vmap_pages_range(start, end, prot, pages);
> +}
> +#endif
> +
> +
>  int is_vmalloc_or_module_addr(const void *x)
>  {
>   /*
> @@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>  {
>   unsigned long addr = (unsigned long) vmalloc_addr;
>   struct page *page = NULL;
> - 

[RESEND PATCH] Documentation/stackprotector: powerpc supports stack protector

2019-06-10 Thread Bhupesh Sharma
powerpc architecture (both 64-bit and 32-bit) supports stack protector
mechanism since some time now [see commit 06ec27aea9fc ("powerpc/64:
add stack protector support")].

Update stackprotector arch support documentation to reflect the same.

Cc: Jonathan Corbet 
Cc: Michael Ellerman 
Cc: linux-...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Bhupesh Sharma 
---
Resend, this time Cc'ing Jonathan and doc-list.

 Documentation/features/debug/stackprotector/arch-support.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/features/debug/stackprotector/arch-support.txt 
b/Documentation/features/debug/stackprotector/arch-support.txt
index ea521f3e..32bbdfc64c32 100644
--- a/Documentation/features/debug/stackprotector/arch-support.txt
+++ b/Documentation/features/debug/stackprotector/arch-support.txt
@@ -22,7 +22,7 @@
 |   nios2: | TODO |
 |openrisc: | TODO |
 |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |
 |   riscv: | TODO |
 |s390: | TODO |
 |  sh: |  ok  |
-- 
2.7.4



Re: Latest Git kernel: Section mismatch in reference from the variable start_here_multiplatform to the function .init.text:.early_setup()

2019-06-10 Thread Christian Zigotzky
Hello Christophe,

Could you please add this patch to the GIT kernel because the issue still 
exists.

Thanks,
Christian

On 15. May 2019, at 12:15, Christophe Leroy  wrote:

Hi,

Le 15/05/2019 à 12:09, Christian Zigotzky a écrit :
Hi All,
I got the following error messages with the latest Git kernel today:
GEN .version
  CHK include/generated/compile.h
  LD  vmlinux.o
  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x302a): Section mismatch in reference from the 
variable start_here_multiplatform to the function .init.text:.early_setup()
The function start_here_multiplatform() references
the function __init .early_setup().
This is often because start_here_multiplatform lacks a __init
annotation or the annotation of .early_setup is wrong.
  MODINFO modules.builtin.modinfo
  KSYM.tmp_kallsyms1.o
  KSYM.tmp_kallsyms2.o
  LD  vmlinux
  SORTEX  vmlinux
  SYSMAP  System.map
  CHKHEAD vmlinux
What does it mean?

-

I proposed a patch for it at https://patchwork.ozlabs.org/patch/1097845/

Christophe

Re: [PATCH v3 1/9] KVM: PPC: Ultravisor: Add PPC_UV config option

2019-06-10 Thread Claudio Carvalho


On 6/7/19 5:11 PM, Leonardo Bras wrote:
>
> On Thu, 2019-06-06 at 14:36 -0300, Claudio Carvalho wrote:
>> From: Anshuman Khandual 
>>
>> CONFIG_PPC_UV adds support for ultravisor.
>>
>> Signed-off-by: Anshuman Khandual 
>> Signed-off-by: Bharata B Rao 
>> Signed-off-by: Ram Pai 
>> [Update config help and commit message]
>> Signed-off-by: Claudio Carvalho 
>> ---
>>  arch/powerpc/Kconfig | 20 
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 8c1c636308c8..276c1857c335 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -439,6 +439,26 @@ config PPC_TRANSACTIONAL_MEM
>> ---help---
>>   Support user-mode Transactional Memory on POWERPC.
>>
>> +config PPC_UV
>> +bool "Ultravisor support"
>> +depends on KVM_BOOK3S_HV_POSSIBLE
>> +select HMM_MIRROR
>> +select HMM
>> +select ZONE_DEVICE
>> +select MIGRATE_VMA_HELPER
>> +select DEV_PAGEMAP_OPS
>> +select DEVICE_PRIVATE
>> +select MEMORY_HOTPLUG
>> +select MEMORY_HOTREMOVE
>> +default n
>> +help
>> +  This option paravirtualizes the kernel to run in POWER platforms that
>> +  supports the Protected Execution Facility (PEF). In such platforms,
>> +  the ultravisor firmware runs at a privilege level above the
>> +  hypervisor.
>> +
>> +  If unsure, say "N".
>> +
>>  config LD_HEAD_STUB_CATCH
>>  bool "Reserve 256 bytes to cope with linker stubs in HEAD text" if 
>> EXPERT
>>  depends on PPC64
> Maybe this patch should be the last of the series, as it may cause some
> bisect trouble to have this option enabled while missing some of the
> patches.

Thanks Leonardo. I changed that for the next version.

Claudio




Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Mark Rutland
Hi,

On Mon, Jun 10, 2019 at 02:38:38PM +1000, Nicholas Piggin wrote:
> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
> allocate huge pages and map them
> 
> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
> (performance is in the noise, under 1% difference, page tables are likely
> to be well cached for this workload). Similar numbers are seen on POWER9.

Do you happen to know which vmalloc mappings these get used for in the
above case? Where do we see vmalloc mappings that large?

I'm worried as to how this would interact with the set_memory_*()
functions, as on arm64 those can only operate on page-granular mappings.
Those may need fixing up to handle huge mappings; certainly if the above
is all for modules.

Thanks,
Mark.

> 
> Signed-off-by: Nicholas Piggin 
> ---
>  include/asm-generic/4level-fixup.h |   1 +
>  include/asm-generic/5level-fixup.h |   1 +
>  include/linux/vmalloc.h|   1 +
>  mm/vmalloc.c   | 132 +++--
>  4 files changed, 107 insertions(+), 28 deletions(-)
> 
> diff --git a/include/asm-generic/4level-fixup.h 
> b/include/asm-generic/4level-fixup.h
> index e3667c9a33a5..3cc65a4dd093 100644
> --- a/include/asm-generic/4level-fixup.h
> +++ b/include/asm-generic/4level-fixup.h
> @@ -20,6 +20,7 @@
>  #define pud_none(pud)0
>  #define pud_bad(pud) 0
>  #define pud_present(pud) 1
> +#define pud_large(pud)   0
>  #define pud_ERROR(pud)   do { } while (0)
>  #define pud_clear(pud)   pgd_clear(pud)
>  #define pud_val(pud) pgd_val(pud)
> diff --git a/include/asm-generic/5level-fixup.h 
> b/include/asm-generic/5level-fixup.h
> index bb6cb347018c..c4377db09a4f 100644
> --- a/include/asm-generic/5level-fixup.h
> +++ b/include/asm-generic/5level-fixup.h
> @@ -22,6 +22,7 @@
>  #define p4d_none(p4d)0
>  #define p4d_bad(p4d) 0
>  #define p4d_present(p4d) 1
> +#define p4d_large(p4d)   0
>  #define p4d_ERROR(p4d)   do { } while (0)
>  #define p4d_clear(p4d)   pgd_clear(p4d)
>  #define p4d_val(p4d) pgd_val(p4d)
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 812bea5866d6..4c92dc608928 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -42,6 +42,7 @@ struct vm_struct {
>   unsigned long   size;
>   unsigned long   flags;
>   struct page **pages;
> + unsigned intpage_shift;
>   unsigned intnr_pages;
>   phys_addr_t phys_addr;
>   const void  *caller;
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dd27cfb29b10..0cf8e861caeb 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -36,6 +36,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, 
> unsigned long end,
>   return ret;
>  }
>  
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> +pgprot_t prot, struct page **pages,
> +unsigned int page_shift)
> +{
> + unsigned long addr = start;
> + unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
> +
> + for (i = 0; i < nr; i++) {
> + int err;
> +
> + err = vmap_range_noflush(addr,
> + addr + (PAGE_SIZE << page_shift),
> + __pa(page_address(pages[i])), prot,
> + PAGE_SHIFT + page_shift);
> + if (err)
> + return err;
> +
> + addr += PAGE_SIZE << page_shift;
> + }
> + flush_cache_vmap(start, end);
> +
> + return nr;
> +}
> +#else
> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> +pgprot_t prot, struct page **pages,
> +unsigned int page_shift)
> +{
> + BUG_ON(page_shift != PAGE_SIZE);
> + return vmap_pages_range(start, end, prot, pages);
> +}
> +#endif
> +
> +
>  int is_vmalloc_or_module_addr(const void *x)
>  {
>   /*
> @@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>  {
>   unsigned long addr = (unsigned long) vmalloc_addr;
>   struct page *page = NULL;
> - pgd_t *pgd = pgd_offset_k(addr);
> + pgd_t *pgd;
>   p4d_t *p4d;
>   pud_t *pud;
>   pmd_t *pmd;
> @@ -474,27 +510,38 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>*/
>   VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>  
> + pgd = pgd_offset_k(addr);
>   if (pgd_none(*pg

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Nicholas Piggin
Mark Rutland's on June 11, 2019 12:10 am:
> Hi,
> 
> On Mon, Jun 10, 2019 at 02:38:38PM +1000, Nicholas Piggin wrote:
>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>> allocate huge pages and map them
>> 
>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>> (performance is in the noise, under 1% difference, page tables are likely
>> to be well cached for this workload). Similar numbers are seen on POWER9.
> 
> Do you happen to know which vmalloc mappings these get used for in the
> above case? Where do we see vmalloc mappings that large?

Large module vmalloc could be subject to huge mappings.

> I'm worried as to how this would interact with the set_memory_*()
> functions, as on arm64 those can only operate on page-granular mappings.
> Those may need fixing up to handle huge mappings; certainly if the above
> is all for modules.

Good point, that looks like it would break on arm64 at least. I'll
work on it. We may have to make this opt in beyond HUGE_VMAP.

Thanks,
Nick


Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-10 Thread Leonardo Bras
On Mon, 2019-06-10 at 08:09 +0530, Anshuman Khandual wrote:
> > > +/*
> > > + * To be potentially processing a kprobe fault and to be allowed
> > > + * to call kprobe_running(), we have to be non-preemptible.
> > > + */
> > > +if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
> > > +if (kprobe_running() && kprobe_fault_handler(regs, trap))
> > 
> > don't need an 'if A if B', can do 'if A && B'
> 
> Which will make it a very lengthy condition check.

Well, is there any problem line-breaking the if condition?

if (A && B && C &&
D && E )

Also, if it's used only to decide the return value, maybe would be fine
to do somethink like that:

return (A && B && C &&
D && E ); 

Regards,


signature.asc
Description: This is a digitally signed message part


Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-10 Thread Larry Finger

On 6/10/19 3:18 AM, Christoph Hellwig wrote:

On Sat, Jun 08, 2019 at 04:52:24PM -0500, Larry Finger wrote:

On 6/7/19 12:29 PM, Christoph Hellwig wrote:

I don't think we should work around this in the driver, we need to fix
it in the core.  I'm curious why my previous patch didn't work.  Can
you throw in a few printks what failed?  I.e. did dma_direct_supported
return false?  Did the actual allocation fail?


Routine dma_direct_supported() returns true.

The failure is in routine dma_set_mask() in the following if test:

 if (!dev->dma_mask || !dma_supported(dev, mask))
 return -EIO;

For b43legacy, dev->dma_mask is 0xc2656848.
 dma_supported(dev, mask) is 0xc08b, mask is 0x3fff, and
the routine returns -EIO.

For b43,   dev->dma_mask is 0xc26568480001,
 dma_supported(dev, mask) is 0xc08b, mask is 0x, and
the routine returns 0.


I don't fully understand what values the above map to.  Can you send
me your actual debugging patch as well?


I do not understand why the if statement returns true as neither of the values 
is zero. After seeing the x86 output shown below, I also do not understand all 
the trailing zeros.


My entire patch is attached. That output came from this section:

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index f7afdad..ba2489d 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -317,9 +317,12 @@ int dma_supported(struct device *dev, u64 mask)

 int dma_set_mask(struct device *dev, u64 mask)
 {
+   pr_info("mask 0x%llx, dma_mask 0x%llx, dma_supported 0x%llx\n", mask, 
dev->dma_mask,

+   dma_supported(dev, mask));
if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;

+   pr_info("Continuing in dma_set_mask()\n");
arch_dma_set_mask(dev, mask);
dma_check_mask(dev, mask);
*dev->dma_mask = mask;

On a 32-bit x86 computer with 1GB of RAM, that same output was

For b43legacy, dev->dma_mask is 0x01f4029044.
dma_supported(dev, mask) is 0x1ef37f7000, mask is 0x3fff, and
the routine returns 0. 30-bit DMA works.

For b43,   dev->dma_mask is 0x01f4029044,
dma_supported(dev, mask) is 0x1ef37f7000, mask is 0x, and
 the routine also returns 0. This card supports 32-bit DMA.

Larry
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index b8286a2..7a367ce 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -319,6 +319,10 @@ extern void copy_user_page(void *to, void *from, unsigned 
long vaddr,
 #endif /* __ASSEMBLY__ */
 #include 
 
+#if 1 /* XXX: pmac?  dynamic discovery? */
+#define ARCH_ZONE_DMA_BITS 30
+#else
 #define ARCH_ZONE_DMA_BITS 31
+#endif
 
 #endif /* _ASM_POWERPC_PAGE_H */
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 09231ef..761d951 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -20,6 +20,8 @@
  */
 static inline bool dma_iommu_alloc_bypass(struct device *dev)
 {
+   pr_info("dev->archdata.iommu_bypass %d, !iommu_fixed_is_weak %d\n",
+   dev->archdata.iommu_bypass, !iommu_fixed_is_weak)   
return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
dma_direct_supported(dev, dev->coherent_dma_mask);
 }
@@ -27,6 +29,8 @@ static inline bool dma_iommu_alloc_bypass(struct device *dev)
 static inline bool dma_iommu_map_bypass(struct device *dev,
unsigned long attrs)
 {
+   pr_info("(attrs & DMA_ATTR_WEAK_ORDERING) %d\n",
+   (attrs & DMA_ATTR_WEAK_ORDERING));
return dev->archdata.iommu_bypass &&
(!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
 }
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index cba2913..2540d3b 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -248,7 +248,8 @@ void __init paging_init(void)
   (long int)((top_of_ram - total_ram) >> 20));
 
 #ifdef CONFIG_ZONE_DMA
-   max_zone_pfns[ZONE_DMA] = min(max_low_pfn, 0x7fffUL >> PAGE_SHIFT);
+   max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
+   ((1UL << ARCH_ZONE_DMA_BITS) - 1) >> PAGE_SHIFT);
 #endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
 #ifdef CONFIG_HIGHMEM
diff --git a/drivers/net/wireless/broadcom/b43/dma.c 
b/drivers/net/wireless/broadcom/b43/dma.c
index 806406a..e0270da 100644
--- a/drivers/net/wireless/broadcom/b43/dma.c
+++ b/drivers/net/wireless/broadcom/b43/dma.c
@@ -1053,6 +1053,7 @@ static int b43_dma_set_mask(struct b43_wldev *dev, u64 
mask)
 * lower mask, as we can always also support a lower one. */
while (1) {
err = dma_set_mask_and_coherent(dev->dev->dma_dev, mask);
+   pr_info("dma_set_mask_and_coherent %d, mask 0x%llx\n", err, 
mask);
if (!err)
break;
if (mask == DMA_B

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-10 Thread Larry Finger

On 6/7/19 11:21 PM, Benjamin Herrenschmidt wrote:



Please try the attached patch. I'm not really pleased with it and I will
continue to determine why the fallback to a 30-bit mask fails, but at least this
one works for me.


Your patch only makes sense if the device is indeed capable of
addressing 31-bits.

So either the driver is buggy and asks for a too small mask in which
case your patch is ok, or it's not and you're just going to cause all
sort of interesting random problems including possible memory
corruption.


Of course the driver may be buggy, but it asks for the correct mask.

This particular device is not capable of handling 32-bit DMA. The driver detects 
the 32-bit failure and falls back to 30 bits. It works on x86, and did on PPC32 
until 5.1. As Christoph said, it should always be possible to use fewer bits 
than the maximum.


Similar devices that are new enough to use b43 rather than b43legacy work with 
new kernels; however, they have and use 32-bit DMA.


Larry


Re: [PATCH v2] powerpc/perf: Use cpumask_last() to determine the designated cpu for nest/core units.

2019-06-10 Thread Leonardo Bras
On Mon, 2019-06-10 at 12:02 +0530, Anju T Sudhakar wrote:
> Nest and core imc(In-memory Collection counters) assigns a particular
> cpu as the designated target for counter data collection.
> During system boot, the first online cpu in a chip gets assigned as
> the designated cpu for that chip(for nest-imc) and the first online cpu
> in a core gets assigned as the designated cpu for that core(for core-imc).
> 
> If the designated cpu goes offline, the next online cpu from the same
> chip(for nest-imc)/core(for core-imc) is assigned as the next target,
> and the event context is migrated to the target cpu.
> Currently, cpumask_any_but() function is used to find the target cpu.
> Though this function is expected to return a `random` cpu, this always
> returns the next online cpu.
> 
> If all cpus in a chip/core is offlined in a sequential manner, starting
> from the first cpu, the event migration has to happen for all the cpus
> which goes offline. Since the migration process involves a grace period,
> the total time taken to offline all the cpus will be significantly high.
> 
> Example:
> In a system which has 2 sockets, with
> NUMA node0 CPU(s): 0-87
> NUMA node8 CPU(s): 88-175
> 
> Time taken to offline cpu 88-175:
> real2m56.099s
> user0m0.191s
> sys 0m0.000s
> 
> Use cpumask_last() to choose the target cpu, when the designated cpu
> goes online, so the migration will happen only when the last_cpu in the
> mask goes offline. This way the time taken to offline all cpus in a
> chip/core can be reduced.
> 
> With the patch, 
> 
> Time taken  to offline cpu 88-175:
> real0m12.207s
> user0m0.171s
> sys 0m0.000s
> 
> 
> Offlining all cpus in reverse order is also taken care because,
> cpumask_any_but() is used to find the designated cpu if the last cpu in
> the mask goes offline. Since cpumask_any_but() always return the first
> cpu in the mask, that becomes the designated cpu and migration will happen
> only when the first_cpu in the mask goes offline.
> 
> Example:
> With the patch,
> 
> Time taken to offline cpu from 175-88:
> real0m9.330s
> user0m0.110s
> sys 0m0.000s

Seems like a very interesting work.
Out of curiosity, have you used 'chcpu -d' to create your benchmark?

> 
> Signed-off-by: Anju T Sudhakar 
> Reviewed-by: Madhavan Srinivasan 
> ---
> 
> Changes from v1:
>   Modified the commit log with more info.
> ---
> 
>  arch/powerpc/perf/imc-pmu.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index 31fa753..fbfd6e7 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -366,7 +366,14 @@ static int ppc_nest_imc_cpu_offline(unsigned int cpu)
>*/
>   nid = cpu_to_node(cpu);
>   l_cpumask = cpumask_of_node(nid);
> - target = cpumask_any_but(l_cpumask, cpu);
> + target = cpumask_last(l_cpumask);
> +
> + /*
> +  * If this(target) is the last cpu in the cpumask for this chip,
> +  * check for any possible online cpu in the chip.
> +  */
> + if (unlikely(target == cpu))
> + target = cpumask_any_but(l_cpumask, cpu);
>  
>   /*
>* Update the cpumask with the target cpu and
> @@ -671,7 +678,10 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu)
>   return 0;
>  
>   /* Find any online cpu in that core except the current "cpu" */
> - ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu);
> + ncpu = cpumask_last(cpu_sibling_mask(cpu));
> +
> + if (unlikely(ncpu == cpu))
> + ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu);
>  
>   if (ncpu >= 0 && ncpu < nr_cpu_ids) {
>   cpumask_set_cpu(ncpu, &core_imc_cpumask);


signature.asc
Description: This is a digitally signed message part


Re: [PATCH RESEND 1/2] tools/perf: Add arch neutral function to choose event for perf kvm record

2019-06-10 Thread Arnaldo Carvalho de Melo
Em Mon, Jun 10, 2019 at 12:15:17PM +0530, Anju T Sudhakar escreveu:
> 'perf kvm record' uses 'cycles'(if the user did not specify any event) as
> the default event to profile the guest.
> This will not provide any proper samples from the guest incase of
> powerpc architecture, since in powerpc the PMUs are controlled by
> the guest rather than the host.
> 
> Patch adds a function to pick an arch specific event for 'perf kvm record',
> instead of selecting 'cycles' as a default event for all architectures.
> 
> For powerpc this function checks for any user specified event, and if there
> isn't any it returns invalid instead of proceeding with 'cycles' event.

Michael, Ravi, Maddy, could you please provide an Acked-by, Reviewed-by
or Tested-by?

- Arnaldo
 
> Signed-off-by: Anju T Sudhakar 
> ---
>  tools/perf/arch/powerpc/util/kvm-stat.c | 37 +
>  tools/perf/builtin-kvm.c| 12 +++-
>  tools/perf/util/kvm-stat.h  |  2 +-
>  3 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/arch/powerpc/util/kvm-stat.c 
> b/tools/perf/arch/powerpc/util/kvm-stat.c
> index f9db341c47b6..66f8fe500945 100644
> --- a/tools/perf/arch/powerpc/util/kvm-stat.c
> +++ b/tools/perf/arch/powerpc/util/kvm-stat.c
> @@ -8,6 +8,7 @@
>  
>  #include "book3s_hv_exits.h"
>  #include "book3s_hcalls.h"
> +#include 
>  
>  #define NR_TPS 4
>  
> @@ -172,3 +173,39 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char 
> *cpuid __maybe_unused)
>  
>   return ret;
>  }
> +
> +/*
> + * Incase of powerpc architecture, pmu registers are programmable
> + * by guest kernel. So monitoring guest via host may not provide
> + * valid samples. It is better to fail the "perf kvm record"
> + * with default "cycles" event to monitor guest in powerpc.
> + *
> + * Function to parse the arguments and return appropriate values.
> + */
> +int kvm_add_default_arch_event(int *argc, const char **argv)
> +{
> + const char **tmp;
> + bool event = false;
> + int i, j = *argc;
> +
> + const struct option event_options[] = {
> + OPT_BOOLEAN('e', "event", &event, NULL),
> + OPT_END()
> + };
> +
> + tmp = calloc(j + 1, sizeof(char *));
> + if (!tmp)
> + return -EINVAL;
> +
> + for (i = 0; i < j; i++)
> + tmp[i] = argv[i];
> +
> + parse_options(j, tmp, event_options, NULL, 0);
> + if (!event) {
> + free(tmp);
> + return -EINVAL;
> + }
> +
> + free(tmp);
> + return 0;
> +}
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index dbb6f737a3e2..fe33b3ec55c9 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -1510,11 +1510,21 @@ static int kvm_cmd_stat(const char *file_name, int 
> argc, const char **argv)
>  }
>  #endif /* HAVE_KVM_STAT_SUPPORT */
>  
> +int __weak kvm_add_default_arch_event(int *argc __maybe_unused,
> + const char **argv __maybe_unused)
> +{
> + return 0;
> +}
> +
>  static int __cmd_record(const char *file_name, int argc, const char **argv)
>  {
> - int rec_argc, i = 0, j;
> + int rec_argc, i = 0, j, ret;
>   const char **rec_argv;
>  
> + ret = kvm_add_default_arch_event(&argc, argv);
> + if (ret)
> + return -EINVAL;
> +
>   rec_argc = argc + 2;
>   rec_argv = calloc(rec_argc + 1, sizeof(char *));
>   rec_argv[i++] = strdup("record");
> diff --git a/tools/perf/util/kvm-stat.h b/tools/perf/util/kvm-stat.h
> index 1403dec189b4..da38b56c46cb 100644
> --- a/tools/perf/util/kvm-stat.h
> +++ b/tools/perf/util/kvm-stat.h
> @@ -144,5 +144,5 @@ extern const int decode_str_len;
>  extern const char *kvm_exit_reason;
>  extern const char *kvm_entry_trace;
>  extern const char *kvm_exit_trace;
> -
> +extern int kvm_add_default_arch_event(int *argc, const char **argv);
>  #endif /* __PERF_KVM_STAT_H */
> -- 
> 2.17.2

-- 

- Arnaldo


Re: [PATCH v3 14/33] docs: kbuild: convert docs to ReST and rename to *.rst

2019-06-10 Thread Federico Vaga
In data Sunday, June 9, 2019 4:27:04 AM CEST, Mauro Carvalho Chehab ha 
scritto:
> The kbuild documentation clearly shows that the documents
> there are written at different times: some use markdown,
> some use their own peculiar logic to split sections.
> 
> Convert everything to ReST without affecting too much
> the author's style and avoiding adding uneeded markups.
> 
> The conversion is actually:
>   - add blank lines and identation in order to identify paragraphs;
>   - fix tables markups;
>   - add some lists markups;
>   - mark literal blocks;
>   - adjust title markups.
> 
> At its new index.rst, let's add a :orphan: while this is not linked to
> the main index.rst file, in order to avoid build warnings.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/admin-guide/README.rst  |   2 +-
>  ...eaders_install.txt => headers_install.rst} |   5 +-
>  Documentation/kbuild/index.rst|  27 +
>  Documentation/kbuild/issues.rst   |  11 +
>  .../kbuild/{kbuild.txt => kbuild.rst} | 119 ++--
>  ...nfig-language.txt => kconfig-language.rst} | 232 
>  ...anguage.txt => kconfig-macro-language.rst} |  37 +-
>  .../kbuild/{kconfig.txt => kconfig.rst}   | 136 +++--
>  .../kbuild/{makefiles.txt => makefiles.rst}   | 530 +++---
>  .../kbuild/{modules.txt => modules.rst}   | 168 +++---
>  Documentation/kernel-hacking/hacking.rst  |   4 +-
>  Documentation/process/coding-style.rst|   2 +-
>  Documentation/process/submit-checklist.rst|   2 +-
>  .../it_IT/kernel-hacking/hacking.rst  |   4 +-
>  .../it_IT/process/coding-style.rst|   2 +-
>  .../it_IT/process/submit-checklist.rst|   2 +-

Limited to translations/it_IT

Acked-by: Federico Vaga 

>  .../zh_CN/process/coding-style.rst|   2 +-
>  .../zh_CN/process/submit-checklist.rst|   2 +-
>  Kconfig   |   2 +-
>  arch/arc/plat-eznps/Kconfig   |   2 +-
>  arch/c6x/Kconfig  |   2 +-
>  arch/microblaze/Kconfig.debug |   2 +-
>  arch/microblaze/Kconfig.platform  |   2 +-
>  arch/nds32/Kconfig|   2 +-
>  arch/openrisc/Kconfig |   2 +-
>  arch/powerpc/sysdev/Kconfig   |   2 +-
>  arch/riscv/Kconfig|   2 +-
>  drivers/auxdisplay/Kconfig|   2 +-
>  drivers/firmware/Kconfig  |   2 +-
>  drivers/mtd/devices/Kconfig   |   2 +-
>  drivers/net/ethernet/smsc/Kconfig |   6 +-
>  drivers/net/wireless/intel/iwlegacy/Kconfig   |   4 +-
>  drivers/net/wireless/intel/iwlwifi/Kconfig|   2 +-
>  drivers/parport/Kconfig   |   2 +-
>  drivers/scsi/Kconfig  |   4 +-
>  drivers/staging/sm750fb/Kconfig   |   2 +-
>  drivers/usb/misc/Kconfig  |   4 +-
>  drivers/video/fbdev/Kconfig   |  14 +-
>  net/bridge/netfilter/Kconfig  |   2 +-
>  net/ipv4/netfilter/Kconfig|   2 +-
>  net/ipv6/netfilter/Kconfig|   2 +-
>  net/netfilter/Kconfig |  16 +-
>  net/tipc/Kconfig  |   2 +-
>  scripts/Kbuild.include|   4 +-
>  scripts/Makefile.host |   2 +-
>  scripts/kconfig/symbol.c  |   2 +-
>  .../tests/err_recursive_dep/expected_stderr   |  14 +-
>  sound/oss/dmasound/Kconfig|   6 +-
>  48 files changed, 840 insertions(+), 561 deletions(-)
>  rename Documentation/kbuild/{headers_install.txt => headers_install.rst}
> (96%) create mode 100644 Documentation/kbuild/index.rst
>  create mode 100644 Documentation/kbuild/issues.rst
>  rename Documentation/kbuild/{kbuild.txt => kbuild.rst} (72%)
>  rename Documentation/kbuild/{kconfig-language.txt => kconfig-language.rst}
> (85%) rename Documentation/kbuild/{kconfig-macro-language.txt =>
> kconfig-macro-language.rst} (94%) rename Documentation/kbuild/{kconfig.txt
> => kconfig.rst} (80%)
>  rename Documentation/kbuild/{makefiles.txt => makefiles.rst} (83%)
>  rename Documentation/kbuild/{modules.txt => modules.rst} (84%)







Re: [PATCH v3 01/11] mm/memory_hotplug: Simplify and fix check_hotplug_memory_range()

2019-06-10 Thread Oscar Salvador
On Mon, May 27, 2019 at 01:11:42PM +0200, David Hildenbrand wrote:
> By converting start and size to page granularity, we actually ignore
> unaligned parts within a page instead of properly bailing out with an
> error.
> 
> Cc: Andrew Morton 
> Cc: Oscar Salvador 
> Cc: Michal Hocko 
> Cc: David Hildenbrand 
> Cc: Pavel Tatashin 
> Cc: Qian Cai 
> Cc: Wei Yang 
> Cc: Arun KS 
> Cc: Mathieu Malaterre 
> Reviewed-by: Dan Williams 
> Reviewed-by: Wei Yang 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Oscar Salvador 

-- 
Oscar Salvador
SUSE L3


Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail

2019-06-10 Thread Oscar Salvador
On Mon, May 27, 2019 at 01:11:51PM +0200, David Hildenbrand wrote:
> We really don't want anything during memory hotunplug to fail.
> We always pass a valid memory block device, that check can go. Avoid
> allocating memory and eventually failing. As we are always called under
> lock, we can use a static piece of memory. This avoids having to put
> the structure onto the stack, having to guess about the stack size
> of callers.
> 
> Patch inspired by a patch from Oscar Salvador.
> 
> In the future, there might be no need to iterate over nodes at all.
> mem->nid should tell us exactly what to remove. Memory block devices
> with mixed nodes (added during boot) should properly fenced off and never
> removed.
> 
> Cc: Greg Kroah-Hartman 
> Cc: "Rafael J. Wysocki" 
> Cc: Alex Deucher 
> Cc: "David S. Miller" 
> Cc: Mark Brown 
> Cc: Chris Wilson 
> Cc: David Hildenbrand 
> Cc: Oscar Salvador 
> Cc: Andrew Morton 
> Cc: Jonathan Cameron 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Oscar Salvador 

-- 
Oscar Salvador
SUSE L3


Re: [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section

2019-06-10 Thread Oscar Salvador
On Mon, May 27, 2019 at 01:11:52PM +0200, David Hildenbrand wrote:
> The parameter is unused, so let's drop it. Memory removal paths should
> never care about zones. This is the job of memory offlining and will
> require more refactorings.
> 
> Reviewed-by: Dan Williams 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Oscar Salvador 

-- 
Oscar Salvador
SUSE L3


Re: [PATCH v3 02/11] s390x/mm: Fail when an altmap is used for arch_add_memory()

2019-06-10 Thread Oscar Salvador
On Mon, May 27, 2019 at 01:11:43PM +0200, David Hildenbrand wrote:
> ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
> don't forget arch_add_memory()/arch_remove_memory() when unlocking
> support.
> 
> Cc: Martin Schwidefsky 
> Cc: Heiko Carstens 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Mike Rapoport 
> Cc: David Hildenbrand 
> Cc: Vasily Gorbik 
> Cc: Oscar Salvador 
> Suggested-by: Dan Williams 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Oscar Salvador 

-- 
Oscar Salvador
SUSE L3


Re: [PATCH] mm/nvdimm: Fix endian conversion issues 

2019-06-10 Thread Verma, Vishal L
On Fri, 2019-06-07 at 12:17 +0530, Aneesh Kumar K.V wrote:
> nd_label->dpa issue was observed when trying to enable the namespace created
> with little-endian kernel on a big-endian kernel. That made me run
> `sparse` on the rest of the code and other changes are the result of that.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/nvdimm/btt.c| 8 
>  drivers/nvdimm/namespace_devs.c | 7 ---
>  2 files changed, 8 insertions(+), 7 deletions(-)

The two BTT fixes seem like they may apply to stable as well, the
problematic code was introduced in relatively recent reworks/fixes.
Perhaps -

Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing")
Fixes: 9dedc73a4658 ("libnvdimm/btt: Fix LBA masking during 'free list' 
population")

Other than that, these look good to me.
Reviewed-by: Vishal Verma 

> 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index 4671776f5623..4ac0f5dde467 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -400,9 +400,9 @@ static int btt_flog_write(struct arena_info *arena, u32 
> lane, u32 sub,
>   arena->freelist[lane].sub = 1 - arena->freelist[lane].sub;
>   if (++(arena->freelist[lane].seq) == 4)
>   arena->freelist[lane].seq = 1;
> - if (ent_e_flag(ent->old_map))
> + if (ent_e_flag(le32_to_cpu(ent->old_map)))
>   arena->freelist[lane].has_err = 1;
> - arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map));
> + arena->freelist[lane].block = ent_lba(le32_to_cpu(ent->old_map));
>  
>   return ret;
>  }
> @@ -568,8 +568,8 @@ static int btt_freelist_init(struct arena_info *arena)
>* FIXME: if error clearing fails during init, we want to make
>* the BTT read-only
>*/
> - if (ent_e_flag(log_new.old_map) &&
> - !ent_normal(log_new.old_map)) {
> + if (ent_e_flag(le32_to_cpu(log_new.old_map)) &&
> + !ent_normal(le32_to_cpu(log_new.old_map))) {
>   arena->freelist[i].has_err = 1;
>   ret = arena_clear_freelist_error(arena, i);
>   if (ret)
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index c4c5a191b1d6..500c37db496a 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1995,7 +1995,7 @@ static struct device *create_namespace_pmem(struct 
> nd_region *nd_region,
>   nd_mapping = &nd_region->mapping[i];
>   label_ent = list_first_entry_or_null(&nd_mapping->labels,
>   typeof(*label_ent), list);
> - label0 = label_ent ? label_ent->label : 0;
> + label0 = label_ent ? label_ent->label : NULL;
>  
>   if (!label0) {
>   WARN_ON(1);
> @@ -2330,8 +2330,9 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>   continue;
>  
>   /* skip labels that describe extents outside of the region */
> - if (nd_label->dpa < nd_mapping->start || nd_label->dpa > 
> map_end)
> - continue;
> + if (__le64_to_cpu(nd_label->dpa) < nd_mapping->start ||
> + __le64_to_cpu(nd_label->dpa) > map_end)
> + continue;
>  
>   i = add_namespace_resource(nd_region, nd_label, devs, count);
>   if (i < 0)



Re: Latest Git kernel: Section mismatch in reference from the variable start_here_multiplatform to the function .init.text:.early_setup()

2019-06-10 Thread Christophe Leroy

Michael ?

Christian Zigotzky  a écrit :


Hello Christophe,

Could you please add this patch to the GIT kernel because the issue  
still exists.


Thanks,
Christian

On 15. May 2019, at 12:15, Christophe Leroy  wrote:

Hi,

Le 15/05/2019 à 12:09, Christian Zigotzky a écrit :
Hi All,
I got the following error messages with the latest Git kernel today:
GEN .version
  CHK include/generated/compile.h
  LD  vmlinux.o
  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x302a): Section mismatch in reference from  
the variable start_here_multiplatform to the function  
.init.text:.early_setup()

The function start_here_multiplatform() references
the function __init .early_setup().
This is often because start_here_multiplatform lacks a __init
annotation or the annotation of .early_setup is wrong.
  MODINFO modules.builtin.modinfo
  KSYM.tmp_kallsyms1.o
  KSYM.tmp_kallsyms2.o
  LD  vmlinux
  SORTEX  vmlinux
  SYSMAP  System.map
  CHKHEAD vmlinux
What does it mean?

-

I proposed a patch for it at https://patchwork.ozlabs.org/patch/1097845/

Christophe





[PATCH v3 1/3] powerpc/powernv: Add OPAL API interface to get secureboot state

2019-06-10 Thread Nayna Jain
From: Claudio Carvalho 

The X.509 certificates trusted by the platform and other information
required to secure boot the OS kernel are wrapped in secure variables,
which are controlled by OPAL.

This patch adds support to read OPAL secure variables through
OPAL_SECVAR_GET call. It returns the metadata and data for a given secure
variable based on the unique key.

Since OPAL can support different types of backend which can vary in the
variable interpretation, a new OPAL API call named OPAL_SECVAR_BACKEND, is
added to retrieve the supported backend version. This helps the consumer
to know how to interpret the variable.

This support can be enabled using CONFIG_OPAL_SECVAR

Signed-off-by: Claudio Carvalho 
Signed-off-by: Nayna Jain 
---
This patch depends on a new OPAL call that is being added to skiboot.
The patch set that implements the new call has been posted to
https://patchwork.ozlabs.org/project/skiboot/list/?series=112868

 arch/powerpc/include/asm/opal-api.h  |  4 +-
 arch/powerpc/include/asm/opal-secvar.h   | 23 ++
 arch/powerpc/include/asm/opal.h  |  6 ++
 arch/powerpc/platforms/powernv/Kconfig   |  6 ++
 arch/powerpc/platforms/powernv/Makefile  |  1 +
 arch/powerpc/platforms/powernv/opal-call.c   |  2 +
 arch/powerpc/platforms/powernv/opal-secvar.c | 85 
 7 files changed, 126 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/opal-secvar.h
 create mode 100644 arch/powerpc/platforms/powernv/opal-secvar.c

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index e1577cfa7186..a505e669b4b6 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -212,7 +212,9 @@
 #define OPAL_HANDLE_HMI2   166
 #defineOPAL_NX_COPROC_INIT 167
 #define OPAL_XIVE_GET_VP_STATE 170
-#define OPAL_LAST  170
+#define OPAL_SECVAR_GET 173
+#define OPAL_SECVAR_BACKEND 177
+#define OPAL_LAST  177
 
 #define QUIESCE_HOLD   1 /* Spin all calls at entry */
 #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal-secvar.h 
b/arch/powerpc/include/asm/opal-secvar.h
new file mode 100644
index ..b677171a0368
--- /dev/null
+++ b/arch/powerpc/include/asm/opal-secvar.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PowerNV definitions for secure variables OPAL API.
+ *
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Claudio Carvalho 
+ *
+ */
+#ifndef OPAL_SECVAR_H
+#define OPAL_SECVAR_H
+
+enum {
+   BACKEND_NONE = 0,
+   BACKEND_TC_COMPAT_V1,
+};
+
+extern int opal_get_variable(u8 *key, unsigned long ksize,
+u8 *metadata, unsigned long *mdsize,
+u8 *data, unsigned long *dsize);
+
+extern int opal_variable_version(unsigned long *backend);
+
+#endif
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 4cc37e708bc7..57d2c2356eda 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -394,6 +394,12 @@ void opal_powercap_init(void);
 void opal_psr_init(void);
 void opal_sensor_groups_init(void);
 
+extern int opal_secvar_get(uint64_t k_key, uint64_t k_key_len,
+  uint64_t k_metadata, uint64_t k_metadata_size,
+  uint64_t k_data, uint64_t k_data_size);
+
+extern int opal_secvar_backend(uint64_t k_backend);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 850eee860cf2..65b060539b5c 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -47,3 +47,9 @@ config PPC_VAS
  VAS adapters are found in POWER9 based systems.
 
  If unsure, say N.
+
+config OPAL_SECVAR
+   bool "OPAL Secure Variables"
+   depends on PPC_POWERNV
+   help
+ This enables the kernel to access OPAL secure variables.
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index da2e99efbd04..6651c742e530 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -16,3 +16,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
 obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
 obj-$(CONFIG_OCXL_BASE)+= ocxl.o
+obj-$(CONFIG_OPAL_SECVAR) += opal-secvar.o
diff --git a/arch/powerpc/platforms/powernv/opal-call.c 
b/arch/powerpc/platforms/powernv/opal-call.c
index 36c8fa3647a2..0445980f294f 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -288,3 +288,5 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, 

[PATCH v3 0/3] powerpc: Enabling IMA arch specific secure boot policies

2019-06-10 Thread Nayna Jain
This patch set, previously named "powerpc: Enabling secure boot on powernv
systems - Part 1", is part of a series that implements secure boot on
PowerNV systems.

In order to verify the OS kernel on PowerNV, secure boot requires X.509
certificates trusted by the platform, the secure boot modes, and several
other pieces of information. These are stored in secure variables
controlled by OPAL, also known as OPAL secure variables.

The IMA architecture specific policy support on POWER is dependent on OPAL
runtime services to access secure variables. OPAL APIs in skiboot are
modified to define generic interface compatible to any backend. This
patchset is consequently updated to be compatible with new OPAL API
interface. This has cleaned up any EFIsms in the arch specific code.
Further, the ima arch specific policies are updated to be able to support
appended signatures. They also now use per policy template.

Exposing the OPAL secure variables to userspace will be posted as a
separate patch set, allowing the IMA architecture specific policy on POWER
to be upstreamed independently.

This patch set adds the following features:

1. Add support for OPAL Runtime API to access secure variables controlled
by OPAL.
2. Define IMA arch-specific policies based on the secure boot state and
mode of the system. On secure boot enabled PowerNV systems, the OS kernel
signature will be verified by IMA appraisal.

Pre-requisites for this patchset are:
1. OPAL APIs in Skiboot[1]
2. Appended signature support in IMA [2]
3. Per policy template support in IMA [3]

[1] https://patchwork.ozlabs.org/project/skiboot/list/?series=112868 
[2] https://patchwork.ozlabs.org/cover/1087361/. Updated version will be
posted soon
[3] Repo: 
https://kernel.googlesource.com/pub/scm/linux/kernel/git/zohar/linux-integrity
Branch: next-queued-testing. Commit: f241bb1f42aa95

--

Original Cover Letter:

This patch set is part of a series that implements secure boot on PowerNV
systems.

In order to verify the OS kernel on PowerNV, secure boot requires X.509
certificates trusted by the platform, the secure boot modes, and several
other pieces of information. These are stored in secure variables
controlled by OPAL, also known as OPAL secure variables.

The IMA architecture specific policy support on Power is dependent on OPAL
runtime services to access secure variables. Instead of directly accessing
the OPAL runtime services, version 3 of this patch set relied upon the
EFI hooks. This version drops that dependency and calls the OPAL runtime
services directly. Skiboot OPAL APIs are due to be posted soon.

Exposing the OPAL secure variables to userspace will be posted as a
separate patch set, allowing the IMA architecture specific policy on Power
to be upstreamed independently.

This patch set adds the following features:

1. Add support for OPAL Runtime API to access secure variables controlled
by OPAL.
2. Define IMA arch-specific policies based on the secure boot state and
mode of the system. On secure boot enabled powernv systems, the OS kernel
signature will be verified by IMA appraisal.

[1] https://patchwork.kernel.org/cover/10882149/

Changelog:

v3:
* OPAL APIs in Patch 1 are updated to provide generic interface based on
key/keylen. This patchset updates kernel OPAL APIs to be compatible with
generic interface.
* Patch 2 is cleaned up to use new OPAL APIs. 
* Since OPAL can support different types of backend which can vary in the
variable interpretation, the Patch 2 is updated to add a check for the
backend version
* OPAL API now expects consumer to first check the supported backend version
before calling other secvar OPAL APIs. This check is now added in patch 2.
* IMA policies in Patch 3 is updated to specify appended signature and
per policy template.
* The patches now are free of any EFIisms.

v2:

* Removed Patch 1: powerpc/include: Override unneeded early ioremap
functions
* Updated Subject line and patch description of the Patch 1 of this series
* Removed dependency of OPAL_SECVAR on EFI, CPU_BIG_ENDIAN and UCS2_STRING
* Changed OPAL APIs from static to non-static. Added opal-secvar.h for the
same
* Removed EFI hooks from opal_secvar.c
* Removed opal_secvar_get_next(), opal_secvar_enqueue() and
opal_query_variable_info() function
* get_powerpc_sb_mode() in secboot.c now directly calls OPAL Runtime API
rather than via EFI hooks.
* Fixed log messages in get_powerpc_sb_mode() function.
* Added dependency for PPC_SECURE_BOOT on configs PPC64 and OPAL_SECVAR
* Replaced obj-$(CONFIG_IMA) with obj-$(CONFIG_PPC_SECURE_BOOT) in
arch/powerpc/kernel/Makefile

Claudio Carvalho (1):
  powerpc/powernv: Add OPAL API interface to get secureboot state

Nayna Jain (2):
  powerpc/powernv: detect the secure boot mode of the system
  powerpc: Add support to initialize ima policy rules

 arch/powerpc/Kconfig | 14 
 arch/powerpc/include/asm/opal-api.h 

[PATCH v3 3/3] powerpc: Add support to initialize ima policy rules

2019-06-10 Thread Nayna Jain
PowerNV secure boot relies on the kernel IMA security subsystem to
perform the OS kernel image signature verification. Since each secure
boot mode has different IMA policy requirements, dynamic definition of
the policy rules based on the runtime secure boot mode of the system is
required. On systems that support secure boot, but have it disabled,
only measurement policy rules of the kernel image and modules are
defined.

This patch defines the arch-specific implementation to retrieve the
secure boot mode of the system and accordingly configures the IMA policy
rules.

This patch provides arch-specific IMA policies if PPC_SECURE_BOOT
config is enabled.

Signed-off-by: Nayna Jain 
---
 arch/powerpc/Kconfig   | 14 +
 arch/powerpc/kernel/Makefile   |  1 +
 arch/powerpc/kernel/ima_arch.c | 54 ++
 include/linux/ima.h|  3 +-
 4 files changed, 71 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/kernel/ima_arch.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..9de77bb14f54 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -902,6 +902,20 @@ config PPC_MEM_KEYS
 
  If unsure, say y.
 
+config PPC_SECURE_BOOT
+   prompt "Enable PowerPC Secure Boot"
+   bool
+   default n
+   depends on PPC64
+   depends on OPAL_SECVAR
+   depends on IMA
+   depends on IMA_ARCH_POLICY
+   help
+ Linux on POWER with firmware secure boot enabled needs to define
+ security policies to extend secure boot to the OS.This config
+ allows user to enable OS Secure Boot on PowerPC systems that
+ have firmware secure boot support.
+
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a20..75c929b41341 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -131,6 +131,7 @@ ifdef CONFIG_IMA
 obj-y  += ima_kexec.o
 endif
 endif
+obj-$(CONFIG_PPC_SECURE_BOOT)  += ima_arch.o
 
 obj-$(CONFIG_AUDIT)+= audit.o
 obj64-$(CONFIG_AUDIT)  += compat_audit.o
diff --git a/arch/powerpc/kernel/ima_arch.c b/arch/powerpc/kernel/ima_arch.c
new file mode 100644
index ..1767bf6e6550
--- /dev/null
+++ b/arch/powerpc/kernel/ima_arch.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain 
+ *
+ * ima_arch.c
+ *  - initialize ima policies for PowerPC Secure Boot
+ */
+
+#include 
+#include 
+
+bool arch_ima_get_secureboot(void)
+{
+   bool sb_mode;
+
+   sb_mode = get_powerpc_sb_mode();
+   if (sb_mode)
+   return true;
+   else
+   return false;
+}
+
+/*
+ * File signature verification is not needed, include only measurements
+ */
+static const char *const default_arch_rules[] = {
+   "measure func=KEXEC_KERNEL_CHECK template=ima-modsig",
+   "measure func=MODULE_CHECK template=ima-modsig",
+   NULL
+};
+
+/* Both file signature verification and measurements are needed */
+static const char *const sb_arch_rules[] = {
+   "measure func=KEXEC_KERNEL_CHECK template=ima-modsig",
+   "measure func=MODULE_CHECK template=ima-modsig",
+   "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig 
template=ima-modsig",
+#if !IS_ENABLED(CONFIG_MODULE_SIG)
+   "appraise func=MODULE_CHECK appraise_type=imasig|modsig 
template=ima-modsig",
+#endif
+   NULL
+};
+
+/*
+ * On PowerPC, file measurements are to be added to the IMA measurement list
+ * irrespective of the secure boot state of the system. Signature verification
+ * is conditionally enabled based on the secure boot state.
+ */
+const char *const *arch_get_ima_policy(void)
+{
+   if (IS_ENABLED(CONFIG_IMA_ARCH_POLICY) && arch_ima_get_secureboot())
+   return sb_arch_rules;
+   return default_arch_rules;
+}
diff --git a/include/linux/ima.h b/include/linux/ima.h
index fd9f7cf4cdf5..a01df076ecae 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -31,7 +31,8 @@ extern void ima_post_path_mknod(struct dentry *dentry);
 extern void ima_add_kexec_buffer(struct kimage *image);
 #endif
 
-#if (defined(CONFIG_X86) && defined(CONFIG_EFI)) || defined(CONFIG_S390)
+#if (defined(CONFIG_X86) && defined(CONFIG_EFI)) || defined(CONFIG_S390) \
+   || defined(CONFIG_PPC_SECURE_BOOT)
 extern bool arch_ima_get_secureboot(void);
 extern const char * const *arch_get_ima_policy(void);
 #else
-- 
2.20.1



[PATCH v3 2/3] powerpc/powernv: detect the secure boot mode of the system

2019-06-10 Thread Nayna Jain
PowerNV secure boot defines different IMA policies based on the secure
boot state of the system.

This patch defines a function to detect the secure boot state of the
system.

Signed-off-by: Nayna Jain 
---
 arch/powerpc/include/asm/secboot.h   | 21 
 arch/powerpc/platforms/powernv/Makefile  |  3 +-
 arch/powerpc/platforms/powernv/secboot.c | 61 
 3 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/secboot.h
 create mode 100644 arch/powerpc/platforms/powernv/secboot.c

diff --git a/arch/powerpc/include/asm/secboot.h 
b/arch/powerpc/include/asm/secboot.h
new file mode 100644
index ..1904fb4a3352
--- /dev/null
+++ b/arch/powerpc/include/asm/secboot.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * PowerPC secure boot definitions
+ *
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain 
+ *
+ */
+#ifndef POWERPC_SECBOOT_H
+#define POWERPC_SECBOOT_H
+
+#if defined(CONFIG_OPAL_SECVAR)
+extern bool get_powerpc_sb_mode(void);
+#else
+static inline bool get_powerpc_sb_mode(void)
+{
+   return false;
+}
+#endif
+
+#endif
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 6651c742e530..74705a3fc474 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -4,6 +4,7 @@ obj-y   += idle.o opal-rtc.o opal-nvram.o 
opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
 obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o 
opal-sensor-groups.o
+obj-y  += secboot.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
@@ -16,4 +17,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
 obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
 obj-$(CONFIG_OCXL_BASE)+= ocxl.o
-obj-$(CONFIG_OPAL_SECVAR) += opal-secvar.o
+obj-$(CONFIG_OPAL_SECVAR) += opal-secvar.o secboot.o
diff --git a/arch/powerpc/platforms/powernv/secboot.c 
b/arch/powerpc/platforms/powernv/secboot.c
new file mode 100644
index ..9199e520ebed
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/secboot.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain 
+ *
+ * secboot.c
+ *  - util function to get powerpc secboot state
+ */
+#include 
+#include 
+#include 
+#include 
+
+bool get_powerpc_sb_mode(void)
+{
+   u8 secure_boot_name[] = "SecureBoot";
+   u8 setup_mode_name[] = "SetupMode";
+   u8 secboot, setupmode;
+   unsigned long size = sizeof(secboot);
+   int status;
+   unsigned long version;
+
+   status = opal_variable_version(&version);
+   if ((status != OPAL_SUCCESS) || (version != BACKEND_TC_COMPAT_V1)) {
+   pr_info("secboot: error retrieving compatible backend\n");
+   return false;
+   }
+
+   status = opal_get_variable(secure_boot_name, sizeof(secure_boot_name),
+  NULL, NULL, &secboot, &size);
+
+   /*
+* For now assume all failures reading the SecureBoot variable implies
+* secure boot is not enabled. Later differentiate failure types.
+*/
+   if (status != OPAL_SUCCESS) {
+   secboot = 0;
+   setupmode = 0;
+   goto out;
+   }
+
+   size = sizeof(setupmode);
+   status = opal_get_variable(setup_mode_name, sizeof(setup_mode_name),
+  NULL, NULL,  &setupmode, &size);
+
+   /*
+* Failure to read the SetupMode variable does not prevent
+* secure boot mode
+*/
+   if (status != OPAL_SUCCESS)
+   setupmode = 0;
+
+out:
+   if ((secboot == 0) || (setupmode == 1)) {
+   pr_info("secboot: secureboot mode disabled\n");
+   return false;
+   }
+
+   pr_info("secboot: secureboot mode enabled\n");
+   return true;
+}
-- 
2.20.1



Re: [PATCH v3 14/33] docs: kbuild: convert docs to ReST and rename to *.rst

2019-06-10 Thread Federico Vaga
In data Sunday, June 9, 2019 4:27:04 AM CEST, Mauro Carvalho Chehab ha 
scritto:
> The kbuild documentation clearly shows that the documents
> there are written at different times: some use markdown,
> some use their own peculiar logic to split sections.
> 
> Convert everything to ReST without affecting too much
> the author's style and avoiding adding uneeded markups.
> 
> The conversion is actually:
>   - add blank lines and identation in order to identify paragraphs;
>   - fix tables markups;
>   - add some lists markups;
>   - mark literal blocks;
>   - adjust title markups.
> 
> At its new index.rst, let's add a :orphan: while this is not linked to
> the main index.rst file, in order to avoid build warnings.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/admin-guide/README.rst  |   2 +-
>  ...eaders_install.txt => headers_install.rst} |   5 +-
>  Documentation/kbuild/index.rst|  27 +
>  Documentation/kbuild/issues.rst   |  11 +
>  .../kbuild/{kbuild.txt => kbuild.rst} | 119 ++--
>  ...nfig-language.txt => kconfig-language.rst} | 232 
>  ...anguage.txt => kconfig-macro-language.rst} |  37 +-
>  .../kbuild/{kconfig.txt => kconfig.rst}   | 136 +++--
>  .../kbuild/{makefiles.txt => makefiles.rst}   | 530 +++---
>  .../kbuild/{modules.txt => modules.rst}   | 168 +++---
>  Documentation/kernel-hacking/hacking.rst  |   4 +-
>  Documentation/process/coding-style.rst|   2 +-
>  Documentation/process/submit-checklist.rst|   2 +-
>  .../it_IT/kernel-hacking/hacking.rst  |   4 +-
>  .../it_IT/process/coding-style.rst|   2 +-
>  .../it_IT/process/submit-checklist.rst|   2 +-

Limited to translations/it_IT

Acked-by: Federico Vaga 

>  .../zh_CN/process/coding-style.rst|   2 +-
>  .../zh_CN/process/submit-checklist.rst|   2 +-
>  Kconfig   |   2 +-
>  arch/arc/plat-eznps/Kconfig   |   2 +-
>  arch/c6x/Kconfig  |   2 +-
>  arch/microblaze/Kconfig.debug |   2 +-
>  arch/microblaze/Kconfig.platform  |   2 +-
>  arch/nds32/Kconfig|   2 +-
>  arch/openrisc/Kconfig |   2 +-
>  arch/powerpc/sysdev/Kconfig   |   2 +-
>  arch/riscv/Kconfig|   2 +-
>  drivers/auxdisplay/Kconfig|   2 +-
>  drivers/firmware/Kconfig  |   2 +-
>  drivers/mtd/devices/Kconfig   |   2 +-
>  drivers/net/ethernet/smsc/Kconfig |   6 +-
>  drivers/net/wireless/intel/iwlegacy/Kconfig   |   4 +-
>  drivers/net/wireless/intel/iwlwifi/Kconfig|   2 +-
>  drivers/parport/Kconfig   |   2 +-
>  drivers/scsi/Kconfig  |   4 +-
>  drivers/staging/sm750fb/Kconfig   |   2 +-
>  drivers/usb/misc/Kconfig  |   4 +-
>  drivers/video/fbdev/Kconfig   |  14 +-
>  net/bridge/netfilter/Kconfig  |   2 +-
>  net/ipv4/netfilter/Kconfig|   2 +-
>  net/ipv6/netfilter/Kconfig|   2 +-
>  net/netfilter/Kconfig |  16 +-
>  net/tipc/Kconfig  |   2 +-
>  scripts/Kbuild.include|   4 +-
>  scripts/Makefile.host |   2 +-
>  scripts/kconfig/symbol.c  |   2 +-
>  .../tests/err_recursive_dep/expected_stderr   |  14 +-
>  sound/oss/dmasound/Kconfig|   6 +-
>  48 files changed, 840 insertions(+), 561 deletions(-)
>  rename Documentation/kbuild/{headers_install.txt => headers_install.rst}
> (96%) create mode 100644 Documentation/kbuild/index.rst
>  create mode 100644 Documentation/kbuild/issues.rst
>  rename Documentation/kbuild/{kbuild.txt => kbuild.rst} (72%)
>  rename Documentation/kbuild/{kconfig-language.txt => kconfig-language.rst}
> (85%) rename Documentation/kbuild/{kconfig-macro-language.txt =>
> kconfig-macro-language.rst} (94%) rename Documentation/kbuild/{kconfig.txt
> => kconfig.rst} (80%)
>  rename Documentation/kbuild/{makefiles.txt => makefiles.rst} (83%)
>  rename Documentation/kbuild/{modules.txt => modules.rst} (84%)





[PATCH 0/1] PPC32: fix ptrace() access to FPU registers

2019-06-10 Thread Radu Rendec
Hi Everyone,

I'm following up on the ptrace() problem that I reported a few days ago.
I believe my version of the code handles all cases correctly. While the
problem essentially boils down to dividing the fpidx by 2 on PPC32, it
becomes tricky when the same code must work correctly on both PPC32 and
PPC64.

One other thing that I believe was handled incorrectly in the previous
version is the unused half of fpscr on PPC32. Note that while PT_FPSCR
is defined as (PT_FPR0 + 2*32 + 1), making only the upper half visible,
PT_FPR0 + 2*32 still corresponds to a possible address that userspace
can pass. In that case, comparing fpidx to (PT_FPSCR - PT_FPR0) would
cause an invalid access to the FPU registers array.

I tested the patch on 4.9.179, but that part of the code is identical in
recent kernels so it should work just the same.

I wrote a simple test program than can be used to quickly test (on an
x86_64 host) that all cases are handled correctly for both PPC32/PPC64.
The code is included below.

I also tested with gdbserver (test patch included below) and verified
that it generates two ptrace() calls for each FPU register, with
addresses between 0xc0 and 0x1bc.

8<--- Makefile -
.PHONY: all clean

all: ptrace-fpregs-32 ptrace-fpregs-64

ptrace-fpregs-32: ptrace-fpregs.c
$(CC) -o ptrace-fpregs-32 -Wall -O2 -m32 ptrace-fpregs.c

ptrace-fpregs-64: ptrace-fpregs.c
$(CC) -o ptrace-fpregs-64 -Wall -O2 ptrace-fpregs.c

clean:
rm -f ptrace-fpregs-32 ptrace-fpregs-64
8<--- ptrace-fpregs.c --
#include 
#include 

#define PT_FPR0 48

#ifndef __x86_64

#define PT_FPR31 (PT_FPR0 + 2*31)
#define PT_FPSCR (PT_FPR0 + 2*32 + 1)

#else

#define PT_FPSCR (PT_FPR0 + 32)

#endif

int test_access(unsigned long addr)
{
int ret;

do {
unsigned long index, fpidx;

ret = -EIO;

/* convert to index and check */
index = addr / sizeof(long);
if ((addr & (sizeof(long) - 1)) || (index > PT_FPSCR))
break;

if (index < PT_FPR0) {
ret = printf("ptrace_put_reg(%lu)", index);
break;
}

ret = 0;
#ifndef __x86_64
if (index == PT_FPSCR - 1) {
/* corner case for PPC32; do nothing */
printf("corner_case");
break;
}
#endif
if (index == PT_FPSCR) {
printf("fpscr");
break;
}

/*
 * FPR is always 64-bit; on PPC32, userspace does two 32-bit
 * accesses. Add bit2 to allow accessing the upper half on
 * 32-bit; on 64-bit, bit2 is always 0 (we validate it above).
 */
fpidx = (addr - PT_FPR0 * sizeof(long)) / 8;
printf("TS_FPR[%lu] + %lu", fpidx, addr & 4);
break;
} while (0);

return ret;
}

int main(void)
{
unsigned long addr;
int rc;

for (addr = 0; addr < PT_FPSCR * sizeof(long) + 16; addr++) {
printf("0x%04lx: ", addr);
rc = test_access(addr);
if (rc < 0)
printf("!err!");
printf("\t<%d>\n", rc);
}

return 0;
}
8<--- gdb.patch 
--- gdb/gdbserver/linux-low.c.orig  2019-06-10 11:45:53.810882669 -0400
+++ gdb/gdbserver/linux-low.c   2019-06-10 11:49:32.272929766 -0400
@@ -4262,6 +4262,8 @@ store_register (struct regcache *regcach
   pid = lwpid_of (get_thread_lwp (current_inferior));
   for (i = 0; i < size; i += sizeof (PTRACE_XFER_TYPE))
 {
+  printf("writing register #%d offset %d at address %#x\n",
+ regno, i, (unsigned int)regaddr);
   errno = 0;
   ptrace (PTRACE_POKEUSER, pid,
/* Coerce to a uintptr_t first to avoid potential gcc warning
8<--

Radu Rendec (1):
  PPC32: fix ptrace() access to FPU registers

 arch/powerpc/kernel/ptrace.c | 85 ++--
 1 file changed, 52 insertions(+), 33 deletions(-)

-- 
2.20.1



[PATCH 1/1] PPC32: fix ptrace() access to FPU registers

2019-06-10 Thread Radu Rendec
This patch addresses several issues with ptrace() access to FPU
registers through PTRACE_PEEKUSR/PTRACE_POKEUSR.

Standard CPU registers are of course the size of the machine word on
both PPC32/PPC64, but FPU registers are always 64-bit. Because the
ptrace() can only transfer one `long` at a time with PTRACE_PEEKUSR and
PTRACE_POKEUSR, on PPC32 userspace must do two separate ptrace() calls
to access a whole FPU register.

This patch fixes the code that translates between ptrace() addresses and
indexes into (struct thread_fp_state).fpr, taking into account all cases
for both PPC32/PPC64. In the previous version, on PPC32, the index was
double the correct value, allowing memory to be accessed beyond the
register array. This had the following side effects:
* Access to all FPU registers (except for FPR0) was broken.
* PTRACE_POKEUSR could corrupt memory following the FPU register array.
  That means the remainder of thread_struct, which is by design the last
  field of task_struct. For large enough ptrace() addresses, memory
  access could go even outside task_struct, corrupting the adjacent
  task_struct.

Note that gdb (which is probably the most frequent user of ptrace() with
PTRACE_PEEKUSR/PTRACE_POKEUSR) seems to always read/write all FPU
registers whenever a traced task stops.

Signed-off-by: Radu Rendec 
---
 arch/powerpc/kernel/ptrace.c | 85 ++--
 1 file changed, 52 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 684b0b315c32..060e5ed0fad9 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -2991,69 +2991,88 @@ long arch_ptrace(struct task_struct *child, long 
request,
switch (request) {
/* read the word at location addr in the USER area. */
case PTRACE_PEEKUSR: {
-   unsigned long index, tmp;
+   unsigned long index, fpidx, tmp = 0;
 
ret = -EIO;
/* convert to index and check */
+   index = addr / sizeof(long);
+   if ((addr & (sizeof(long) - 1)) || (index > PT_FPSCR))
+   break;
 #ifdef CONFIG_PPC32
-   index = addr >> 2;
-   if ((addr & 3) || (index > PT_FPSCR)
-   || (child->thread.regs == NULL))
-#else
-   index = addr >> 3;
-   if ((addr & 7) || (index > PT_FPSCR))
-#endif
+   if (child->thread.regs == NULL)
break;
+#endif
 
CHECK_FULL_REGS(child->thread.regs);
if (index < PT_FPR0) {
ret = ptrace_get_reg(child, (int) index, &tmp);
if (ret)
break;
-   } else {
-   unsigned int fpidx = index - PT_FPR0;
+   goto out_peekusr;
+   }
 
-   flush_fp_to_thread(child);
-   if (fpidx < (PT_FPSCR - PT_FPR0))
-   memcpy(&tmp, &child->thread.TS_FPR(fpidx),
-  sizeof(long));
-   else
-   tmp = child->thread.fp_state.fpscr;
+   flush_fp_to_thread(child);
+#ifdef CONFIG_PPC32
+   if (index == PT_FPSCR - 1)
+   /* corner case for PPC32; do nothing */
+   goto out_peekusr;
+#endif
+   if (index == PT_FPSCR) {
+   tmp = child->thread.fp_state.fpscr;
+   goto out_peekusr;
}
+   /*
+* FPR is always 64-bit; on PPC32, userspace does two 32-bit
+* accesses. Add bit2 to allow accessing the upper half on
+* 32-bit; on 64-bit, bit2 is always 0 (we validate it above).
+*/
+   fpidx = (addr - PT_FPR0 * sizeof(long)) / 8;
+   memcpy(&tmp, (void *)&child->thread.TS_FPR(fpidx) + (addr & 4),
+   sizeof(long));
+out_peekusr:
ret = put_user(tmp, datalp);
break;
}
 
/* write the word at location addr in the USER area */
case PTRACE_POKEUSR: {
-   unsigned long index;
+   unsigned long index, fpidx;
 
ret = -EIO;
/* convert to index and check */
+   index = addr / sizeof(long);
+   if ((addr & (sizeof(long) - 1)) || (index > PT_FPSCR))
+   break;
 #ifdef CONFIG_PPC32
-   index = addr >> 2;
-   if ((addr & 3) || (index > PT_FPSCR)
-   || (child->thread.regs == NULL))
-#else
-   index = addr >> 3;
-   if ((addr & 7) || (index > PT_FPSCR))
-#endif
+   if (child->thread.regs == NULL)
break;
+#endif
 
CHECK_FULL_REGS(child->thread.regs);
if (inde

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Nicholas Piggin
Anshuman Khandual's on June 10, 2019 6:53 pm:
> On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>> allocate huge pages and map them.
> 
> IIUC that extends HAVE_ARCH_HUGE_VMAP from iormap to vmalloc. 
> 
>> 
>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>> (performance is in the noise, under 1% difference, page tables are likely
>> to be well cached for this workload). Similar numbers are seen on POWER9.
> 
> Sure will try this on arm64.
> 
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  include/asm-generic/4level-fixup.h |   1 +
>>  include/asm-generic/5level-fixup.h |   1 +
>>  include/linux/vmalloc.h|   1 +
>>  mm/vmalloc.c   | 132 +++--
>>  4 files changed, 107 insertions(+), 28 deletions(-)
>> 
>> diff --git a/include/asm-generic/4level-fixup.h 
>> b/include/asm-generic/4level-fixup.h
>> index e3667c9a33a5..3cc65a4dd093 100644
>> --- a/include/asm-generic/4level-fixup.h
>> +++ b/include/asm-generic/4level-fixup.h
>> @@ -20,6 +20,7 @@
>>  #define pud_none(pud)   0
>>  #define pud_bad(pud)0
>>  #define pud_present(pud)1
>> +#define pud_large(pud)  0
>>  #define pud_ERROR(pud)  do { } while (0)
>>  #define pud_clear(pud)  pgd_clear(pud)
>>  #define pud_val(pud)pgd_val(pud)
>> diff --git a/include/asm-generic/5level-fixup.h 
>> b/include/asm-generic/5level-fixup.h
>> index bb6cb347018c..c4377db09a4f 100644
>> --- a/include/asm-generic/5level-fixup.h
>> +++ b/include/asm-generic/5level-fixup.h
>> @@ -22,6 +22,7 @@
>>  #define p4d_none(p4d)   0
>>  #define p4d_bad(p4d)0
>>  #define p4d_present(p4d)1
>> +#define p4d_large(p4d)  0
>>  #define p4d_ERROR(p4d)  do { } while (0)
>>  #define p4d_clear(p4d)  pgd_clear(p4d)
>>  #define p4d_val(p4d)pgd_val(p4d)
> 
> Both of these are required from vmalloc_to_page() which as per a later comment
> should be part of a prerequisite patch before this series.

I'm not sure what you mean. This patch is where they get used.

Possibly I could split this and the vmalloc_to_page change out. I'll
consider it.

>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 812bea5866d6..4c92dc608928 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -42,6 +42,7 @@ struct vm_struct {
>>  unsigned long   size;
>>  unsigned long   flags;
>>  struct page **pages;
>> +unsigned intpage_shift;
> 
> So the entire vm_struct will be mapped with a single page_shift. It cannot 
> have
> mix and match mappings with PAGE_SIZE, PMD_SIZE, PUD_SIZE etc in case the
> allocation fails for larger ones, falling back etc what over other reasons.

For now, yes. I have a bit of follow up work to improve that and make
it able to fall back, but it's a bit more churn and not a significant
benefit just yet because there are not a lot of very large vmallocs
(except the early hashes which can be satisfied with large allocs).

> 
>>  unsigned intnr_pages;
>>  phys_addr_t phys_addr;
>>  const void  *caller;
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index dd27cfb29b10..0cf8e861caeb 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  
>> @@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, 
>> unsigned long end,
>>  return ret;
>>  }
>>  
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> 
> A small nit (if you agree) s/hpages/huge_pages/

Hmm. It's not actually a good function name because it can do small
pages as well. vmap_pages_size_range or something may be better.

> 
>> +   pgprot_t prot, struct page **pages,
> 
> Re-order (prot <---> pages) just to follow the standard like before.

Will do.

>> +   unsigned int page_shift)
>> +{
>> +unsigned long addr = start;
>> +unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
> 
> s/nr/nr_huge_pages ?

Sure.

> Also should not we check for the alignment of the range [start...end] with
> respect to (1UL << [PAGE_SHIFT + page_shift]).

The caller should if it specifies large page. Could check and -EINVAL
for incorrect alignment.

>> +
>> +for (i = 0; i < nr; i++) {
>> +int err;
>> +
>> +err = vmap_range_noflush(addr,
>> +addr + (PAGE_SIZE << page_shift),
>> +__p

Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2019-06-10 Thread Naoya Horiguchi
On Wed, May 29, 2019 at 04:31:01PM -0700, Mike Kravetz wrote:
> On 5/28/19 2:49 AM, Wanpeng Li wrote:
> > Cc Paolo,
> > Hi all,
> > On Wed, 14 Feb 2018 at 06:34, Mike Kravetz  wrote:
> >>
> >> On 02/12/2018 06:48 PM, Michael Ellerman wrote:
> >>> Andrew Morton  writes:
> >>>
>  On Thu, 08 Feb 2018 12:30:45 + Punit Agrawal  
>  wrote:
> 
> >>
> >> So I don't think that the above test result means that errors are 
> >> properly
> >> handled, and the proposed patch should help for arm64.
> >
> > Although, the deviation of pud_huge() avoids a kernel crash the code
> > would be easier to maintain and reason about if arm64 helpers are
> > consistent with expectations by core code.
> >
> > I'll look to update the arm64 helpers once this patch gets merged. But
> > it would be helpful if there was a clear expression of semantics for
> > pud_huge() for various cases. Is there any version that can be used as
> > reference?
> 
>  Is that an ack or tested-by?
> 
>  Mike keeps plaintively asking the powerpc developers to take a look,
>  but they remain steadfastly in hiding.
> >>>
> >>> Cc'ing linuxppc-dev is always a good idea :)
> >>>
> >>
> >> Thanks Michael,
> >>
> >> I was mostly concerned about use cases for soft/hard offline of huge pages
> >> larger than PMD_SIZE on powerpc.  I know that powerpc supports PGD_SIZE
> >> huge pages, and soft/hard offline support was specifically added for this.
> >> See, 94310cbcaa3c "mm/madvise: enable (soft|hard) offline of HugeTLB pages
> >> at PGD level"
> >>
> >> This patch will disable that functionality.  So, at a minimum this is a
> >> 'heads up'.  If there are actual use cases that depend on this, then more
> >> work/discussions will need to happen.  From the e-mail thread on PGD_SIZE
> >> support, I can not tell if there is a real use case or this is just a
> >> 'nice to have'.
> > 
> > 1GB hugetlbfs pages are used by DPDK and VMs in cloud deployment, we
> > encounter gup_pud_range() panic several times in product environment.
> > Is there any plan to reenable and fix arch codes?
> 
> I too am aware of slightly more interest in 1G huge pages.  Suspect that as
> Intel MMU capacity increases to handle more TLB entries there will be more
> and more interest.
> 
> Personally, I am not looking at this issue.  Perhaps Naoya will comment as
> he know most about this code.

Thanks for forwarding this to me, I'm feeling that memory error handling
on 1GB hugepage is demanded as real use case.

> 
> > In addition, 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/mmu.c#n3213
> > The memory in guest can be 1GB/2MB/4K, though the host-backed memory
> > are 1GB hugetlbfs pages, after above PUD panic is fixed,
> > try_to_unmap() which is called in MCA recovery path will mark the PUD
> > hwpoison entry. The guest will vmexit and retry endlessly when
> > accessing any memory in the guest which is backed by this 1GB poisoned
> > hugetlbfs page. We have a plan to split this 1GB hugetblfs page by 2MB
> > hugetlbfs pages/4KB pages, maybe file remap to a virtual address range
> > which is 2MB/4KB page granularity, also split the KVM MMU 1GB SPTE
> > into 2MB/4KB and mark the offensive SPTE w/ a hwpoison flag, a sigbus
> > will be delivered to VM at page fault next time for the offensive
> > SPTE. Is this proposal acceptable?
> 
> I am not sure of the error handling design, but this does sound reasonable.

I agree that that's better.

> That block of code which potentially dissolves a huge page on memory error
> is hard to understand and I'm not sure if that is even the 'normal'
> functionality.  Certainly, we would hate to waste/poison an entire 1G page
> for an error on a small subsection.

Yes, that's not practical, so we need at first establish the code base for
2GB hugetlb splitting and then extending it to 1GB next.

Thanks,
Naoya Horiguchi


[Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument

2019-06-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203839

--- Comment #6 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 283183
  --> https://bugzilla.kernel.org/attachment.cgi?id=283183&action=edit
bisect.log

bisect took me a while due to quite some skips. Cherry-picking
397d2300b08cdee052053e362018cdb6dd65eea2 and
305d60012304684bd59ea1f67703e51662e4906a helped me complete it.

# git bisect good | tee -a /root/bisect02.log
215b823707ce4e8e52b106915f70357fa474c669 is the first bad commit
commit 215b823707ce4e8e52b106915f70357fa474c669
Author: Christophe Leroy 
Date:   Fri Apr 26 16:23:36 2019 +

powerpc/32s: set up an early static hash table for KASAN.

KASAN requires early activation of hash table, before memblock()
functions are available.

This patch implements an early hash_table statically defined in
__initdata.

During early boot, a single page table is used.

For hash32, when doing the final init, one page table is allocated
for each PGD entry because of the _PAGE_HASHPTE flag which can't be
common to several virt pages. This is done after memblock get
available but before switching to the final hash table, otherwise
there are issues with TLB flushing due to the shared entries.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 

:04 04 abc24eb3c4ad3e4f2b1eb7b52c295c8b95d79a78
c3b6114c26eb8e181abb3f1abc9b6ecc12292f4d M  arch

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 203839] Kernel 5.2-rc3 fails to boot on a PowerMac G4 3,6: systemd[1]: Failed to bump fs.file-max, ignoring: invalid argument

2019-06-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203839

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Attachment #283139|0   |1
is obsolete||

--- Comment #7 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 283185
  --> https://bugzilla.kernel.org/attachment.cgi?id=283185&action=edit
kernel .config (5.1.0-rc3+, G4 MDD)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: crash after NX error

2019-06-10 Thread Haren Myneni
On 06/05/2019 04:06 AM, Michael Ellerman wrote:
> Stewart Smith  writes:
>> On my two socket POWER9 system (powernv) with 842 zwap set up, I
>> recently got a crash with the Ubuntu kernel (I haven't tried with
>> upstream, and this is the first time the system has died like this, so
>> I'm not sure how repeatable it is).
>>
>> [2.891463] zswap: loaded using pool 842-nx/zbud
>> ...
>> [15626.124646] nx_compress_powernv: ERROR: CSB still not valid after 500 
>> us, giving up : 00 00 00 00 
>> [16868.932913] Unable to handle kernel paging request for data at address 
>> 0x6655f67da816cdb8
>> [16868.933726] Faulting instruction address: 0xc0391600
>>
>>
>> cpu 0x68: Vector: 380 (Data Access Out of Range) at [c01c9d98b9a0]
>> pc: c0391600: kmem_cache_alloc+0x2e0/0x340
>> lr: c03915ec: kmem_cache_alloc+0x2cc/0x340
>> sp: c01c9d98bc20
>>msr: 9280b033
>>dar: 6655f67da816cdb8
>>   current = 0xc01ad43cb400
>>   paca= 0xcfac7800   softe: 0irq_happened: 0x01
>> pid   = 8319, comm = make
>> Linux version 4.15.0-50-generic (buildd@bos02-ppc64el-006) (gcc version 
>> 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #54-Ubuntu SMP Mon May 6 18:55:18 UTC 2019 
>> (Ubuntu 4.15.0-50.54-generic 4.15.18)
>>
>> 68:mon> t
>> [c01c9d98bc20] c03914d4 kmem_cache_alloc+0x1b4/0x340 (unreliable)
>> [c01c9d98bc80] c03b1e14 __khugepaged_enter+0x54/0x220
>> [c01c9d98bcc0] c010f0ec copy_process.isra.5.part.6+0xebc/0x1a10
>> [c01c9d98bda0] c010fe4c _do_fork+0xec/0x510
>> [c01c9d98be30] c000b584 ppc_clone+0x8/0xc
>> --- Exception: c00 (System Call) at 7afe9daf87f4
>> SP (7fffca606880) is in userspace
>>
>> So, it looks like there could be a problem in the error path, plausibly
>> fixed by this patch:
>>
>> commit 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
>> Author: Haren Myneni 
>> Date:   Wed Jun 13 00:32:40 2018 -0700
>>
>> crypto/nx: Initialize 842 high and normal RxFIFO control registers
>> 
>> NX increments readOffset by FIFO size in receive FIFO control register
>> when CRB is read. But the index in RxFIFO has to match with the
>> corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
>> may be processing incorrect CRBs and can cause CRB timeout.
>> 
>> VAS FIFO offset is 0 when the receive window is opened during
>> initialization. When the module is reloaded or in kexec boot, readOffset
>> in FIFO control register may not match with VAS entry. This patch adds
>> nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
>> control register for both high and normal FIFOs.
>> 
>> Signed-off-by: Haren Myneni 
>> [mpe: Fixup uninitialized variable warning]
>> Signed-off-by: Michael Ellerman 
>>
>> $ git describe --contains 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
>> v4.19-rc1~24^2~50
>>
>>
>> Which was never backported to any stable release, so probably needs to
>> be for v4.14 through v4.18.
> 
> Yeah the P9 NX support went in in:
>   b0d6c9bab5e4 ("crypto/nx: Add P9 NX support for 842 compression engine")
> 
> Which was: v4.14-rc1~119^2~21, so first released in v4.14.
> 
> 
> I'm actually less interested in that and more interested in the
> subsequent crash. The time stamps are miles apart though, did we just
> leave some corrupted memory after the NX failed and then hit it later?
> Or did we not correctly signal to the upper level APIs that the request
> failed.
> 
> I think we need to do some testing with errors injected into the
> wait_for_csb() path, to ensure that failures there are not causing
> corrupting in zswap. Haren have you done any testing of error injection?

The code path returns error code from wait_for_csb() properly to upper level 
APIs. In the case of decompression case, upon failure the request will fall 
back to SW 842. 

If NX is involved in this crash, the compression request may be successful with 
invalid CRB (mismatch FIFO entries in NX and VAS). Then SW 842 may be 
decompressed invalid data which might cause corruption later when accessing it. 

I will try to reproduce the issue with 4.14 kernel,

Thanks
Haren
  
> 
> cheers
> 



Re: [RFC PATCH] powerpc/book3e: KASAN Full support for 64bit

2019-06-10 Thread Daniel Axtens
Christophe Leroy  writes:

> On 06/03/2019 11:50 PM, Daniel Axtens wrote:
>> Christophe Leroy  writes:
>> 
>>> Hi,
>>>
>>> Ok, can you share your .config ?
>> 
>> Sure! This one is with kasan off as the last build I did was testing to
>> see if the code reorgisation was the cause of the issues. (it was not)
>> 
>> 
>> 
>> 
>> This was the kasan-enabled config that failed to boot:
>> 
>> 
>
> Same issue with your .config under QEMU:
>
> A go with gdb shows:
>
> Breakpoint 3, 0xc0027b6c in exc_0x700_common ()
> => 0xc0027b6c :   f8 01 00 70 std 
> r0,112(r1)
> (gdb) bt
> #0  0xc0027b6c in exc_0x700_common ()
> #1  0xc136f80c in .udbg_init_memcons ()
>

Thanks for debugging this!

> Without CONFIG_PPC_EARLY_DEBUG, it boots fine for me. Can you check on 
> your side ?

Yes, that works on my side.

> Deactivating KASAN for arch/powerpc/kernel/udbg.o and 
> arch/powerpc/sysdev/udbg_memcons.o is not enough, we hit a call to 
> strstr() in register_early_udbg_console(), and once we get rid of it (in 
> the same way as in prom_init.c) the next issue is register_console() and 
> I don't know what to do about that one.

Disabling early debug seems like a reasonable restriction to add.

I'll have a look at modules across this and book3s next.

Regards,
Daniel

>
> Christophe
>
>> 
>> 
>> Regards,
>> Daniel
>> 
>>>
>>> Christophe
>>>
>>> Le 31/05/2019 à 03:29, Daniel Axtens a écrit :
 Hi Christophe,

 I tried this on the t4240rdb and it fails to boot if KASAN is
 enabled. It does boot with the patch applied but KASAN disabled, so that
 narrows it down a little bit.

 I need to focus on 3s first so I'll just drop 3e from my patch set for
 now.

 Regards,
 Daniel

> The KASAN shadow area is mapped into vmemmap space:
> 0x8000 0400   to 0x8000 0600  .
> For this vmemmap has to be disabled.
>
> Cc: Daniel Axtens 
> Signed-off-by: Christophe Leroy 
> ---
>arch/powerpc/Kconfig  |   1 +
>arch/powerpc/Kconfig.debug|   3 +-
>arch/powerpc/include/asm/kasan.h  |  11 +++
>arch/powerpc/kernel/Makefile  |   2 +
>arch/powerpc/kernel/head_64.S |   3 +
>arch/powerpc/kernel/setup_64.c|  20 +++---
>arch/powerpc/mm/kasan/Makefile|   1 +
>arch/powerpc/mm/kasan/kasan_init_64.c | 129 
> ++
>8 files changed, 159 insertions(+), 11 deletions(-)
>create mode 100644 arch/powerpc/mm/kasan/kasan_init_64.c
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 1a2fb50126b2..e0b7c45e4dc7 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -174,6 +174,7 @@ config PPC
>   select HAVE_ARCH_AUDITSYSCALL
>   select HAVE_ARCH_JUMP_LABEL
>   select HAVE_ARCH_KASAN  if PPC32
> + select HAVE_ARCH_KASAN  if PPC_BOOK3E_64 && 
> !SPARSEMEM_VMEMMAP
>   select HAVE_ARCH_KGDB
>   select HAVE_ARCH_MMAP_RND_BITS
>   select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 61febbbdd02b..b4140dd6b4e4 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -370,4 +370,5 @@ config PPC_FAST_ENDIAN_SWITCH
>config KASAN_SHADOW_OFFSET
>   hex
>   depends on KASAN
> - default 0xe000
> + default 0xe000 if PPC32
> + default 0x68000400 if PPC64
> diff --git a/arch/powerpc/include/asm/kasan.h 
> b/arch/powerpc/include/asm/kasan.h
> index 296e51c2f066..756b3d58f921 100644
> --- a/arch/powerpc/include/asm/kasan.h
> +++ b/arch/powerpc/include/asm/kasan.h
> @@ -23,10 +23,21 @@
>
>#define KASAN_SHADOW_OFFSETASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
>
> +#ifdef CONFIG_PPC32
>#define KASAN_SHADOW_END   0UL
>
>#define KASAN_SHADOW_SIZE  (KASAN_SHADOW_END - KASAN_SHADOW_START)
>
> +#else
> +
> +#include 
> +
> +#define KASAN_SHADOW_SIZE(KERN_VIRT_SIZE >> 
> KASAN_SHADOW_SCALE_SHIFT)
> +
> +#define KASAN_SHADOW_END (KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
> +
> +#endif /* CONFIG_PPC32 */
> +
>#ifdef CONFIG_KASAN
>void kasan_early_init(void);
>void kasan_mmu_init(void);
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 0ea6c4aa3a20..7f232c06f11d 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -35,6 +35,8 @@ KASAN_SANITIZE_early_32.o := n
>KASAN_SANITIZE_cputable.o := n
>KASAN_SANITIZE_prom_init.o := n
>KASAN_SANITIZE_btext.o := n
> +KASAN_SANITIZE_paca.o := n

Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-10 Thread Cédric Le Goater
Hello Michael,


> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -822,18 +822,21 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>   mtspr   SPRN_IAMR, r5
>   mtspr   SPRN_PSPB, r6
>   mtspr   SPRN_FSCR, r7
> - ld  r5, VCPU_DAWR(r4)
> - ld  r6, VCPU_DAWRX(r4)
> - ld  r7, VCPU_CIABR(r4)
> - ld  r8, VCPU_TAR(r4)
>   /*
>* Handle broken DAWR case by not writing it. This means we
>* can still store the DAWR register for migration.
>*/
> -BEGIN_FTR_SECTION
> + LOAD_REG_ADDR(r5, dawr_force_enable)
> + lbz r5, 0(r5)
> + cmpdi   r5, 0
> + beq 1f
> + ld  r5, VCPU_DAWR(r4)
> + ld  r6, VCPU_DAWRX(r4)
>   mtspr   SPRN_DAWR, r5
>   mtspr   SPRN_DAWRX, r6
> -END_FTR_SECTION_IFSET(CPU_FTR_DAWR)
> +1:
> + ld  r7, VCPU_CIABR(r4)
> + ld  r8, VCPU_TAR(r4)
>   mtspr   SPRN_CIABR, r7
>   mtspr   SPRN_TAR, r8
>   ld  r5, VCPU_IC(r4)
> @@ -2513,11 +2516,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>   blr
>  
>  2:
> -BEGIN_FTR_SECTION
> - /* POWER9 with disabled DAWR */
> + LOAD_REG_ADDR(r11, dawr_force_enable)
> + lbz r11, 0(r11)
> + cmpdi   r11, 0
>   li  r3, H_HARDWARE
> - blr
> -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> + beqlr

Why is this a 'beqlr' ? Shouldn't it be a blr ? 

C.

>   /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
>   rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
>   rlwimi  r5, r4, 2, DAWRX_WT
> 



[PATCH] crypto: vmx - Document CTR mode counter width quirks

2019-06-10 Thread Daniel Axtens
The CTR code comes from OpenSSL, where it does a 32-bit counter.
The kernel has a 128-bit counter. This difference has lead to
issues.

Document it.

Signed-off-by: Daniel Axtens 
---
 drivers/crypto/vmx/aesp8-ppc.pl | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl
index 9c6b5c1d6a1a..db874367b602 100644
--- a/drivers/crypto/vmx/aesp8-ppc.pl
+++ b/drivers/crypto/vmx/aesp8-ppc.pl
@@ -1286,6 +1286,24 @@ ___
 
 #
 {{{# CTR procedure[s]  #
+
+### WARNING: Here be dragons! ###
+#
+# This code is written as 'ctr32', based on a 32-bit counter used
+# upstream. The kernel does *not* use a 32-bit counter. The kernel uses
+# a 128-bit counter.
+#
+# This leads to subtle changes from the upstream code: the counter
+# is incremented with vaddu_q_m rather than vaddu_w_m. This occurs in
+# both the bulk (8 blocks at a time) path, and in the individual block
+# path. Be aware of this when doing updates.
+#
+# See:
+# 1d4aa0b4c181 ("crypto: vmx - Fixing AES-CTR counter bug")
+# 009b30ac7444 ("crypto: vmx - CTR: always increment IV as quadword")
+# https://github.com/openssl/openssl/pull/8942
+#
+#
 my ($inp,$out,$len,$key,$ivp,$x10,$rounds,$idx)=map("r$_",(3..10));
 my ($rndkey0,$rndkey1,$inout,$tmp)=map("v$_",(0..3));
 my ($ivec,$inptail,$inpperm,$outhead,$outperm,$outmask,$keyperm,$one)=
@@ -1357,7 +1375,7 @@ Loop_ctr32_enc:
addi$idx,$idx,16
bdnzLoop_ctr32_enc
 
-   vadduqm $ivec,$ivec,$one
+   vadduqm $ivec,$ivec,$one# Kernel change for 128-bit
 vmr$dat,$inptail
 lvx$inptail,0,$inp
 addi   $inp,$inp,16
@@ -1501,7 +1519,7 @@ Load_ctr32_enc_key:
$SHL$len,$len,4
 
vadduqm $out1,$ivec,$one# counter values ...
-   vadduqm $out2,$ivec,$two
+   vadduqm $out2,$ivec,$two# (do all ctr adds as 128-bit)
vxor$out0,$ivec,$rndkey0# ... xored with rndkey[0]
 le?li  $idx,8
vadduqm $out3,$out1,$two
-- 
2.20.1



Re: [PATCH RESEND 1/2] tools/perf: Add arch neutral function to choose event for perf kvm record

2019-06-10 Thread Ravi Bangoria



On 6/10/19 8:46 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jun 10, 2019 at 12:15:17PM +0530, Anju T Sudhakar escreveu:
>> 'perf kvm record' uses 'cycles'(if the user did not specify any event) as
>> the default event to profile the guest.
>> This will not provide any proper samples from the guest incase of
>> powerpc architecture, since in powerpc the PMUs are controlled by
>> the guest rather than the host.
>>
>> Patch adds a function to pick an arch specific event for 'perf kvm record',
>> instead of selecting 'cycles' as a default event for all architectures.
>>
>> For powerpc this function checks for any user specified event, and if there
>> isn't any it returns invalid instead of proceeding with 'cycles' event.
> 
> Michael, Ravi, Maddy, could you please provide an Acked-by, Reviewed-by
> or Tested-by?

Code looks fine to me but cross-build fails for aarch64:

  builtin-kvm.c:1513:12: error: no previous prototype for 
'kvm_add_default_arch_event' [-Werror=missing-prototypes]
   int __weak kvm_add_default_arch_event(int *argc __maybe_unused,
  ^~
  cc1: all warnings being treated as errors
  mv: cannot stat './.builtin-kvm.o.tmp': No such file or directory

With the build fix:
Acked-by: Ravi Bangoria 



[PATCH kernel] powerpc/powernv/ioda: Fix race in TCE level allocation

2019-06-10 Thread Alexey Kardashevskiy
pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.

This adds a spinlock to handle the race. The alloc==true case is not
possible in the real mode so spinlock is safe for KVM as well.

CC: sta...@vger.kernel.org # v4.19+
Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on 
demand")
Signed-off-by: Alexey Kardashevskiy 
---

This fixes EEH's from
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810


---
 arch/powerpc/include/asm/iommu.h  |  1 +
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 21 ---
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 2c1845e5e851..1825b4cc0097 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -111,6 +111,7 @@ struct iommu_table {
struct iommu_table_ops *it_ops;
struct krefit_kref;
int it_nid;
+   spinlock_t it_lock;
 };
 
 #define IOMMU_TABLE_USERSPACE_ENTRY_RO(tbl, entry) \
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5e..9a19d61e2b12 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -29,6 +29,7 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
tbl->it_size = tce_size >> 3;
tbl->it_busno = 0;
tbl->it_type = TCE_PCI;
+   spin_lock_init(&tbl->it_lock);
 }
 
 static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
@@ -60,18 +61,22 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, 
long idx, bool alloc)
unsigned long tce;
 
if (tmp[n] == 0) {
-   __be64 *tmp2;
-
if (!alloc)
return NULL;
 
-   tmp2 = pnv_alloc_tce_level(tbl->it_nid,
-   ilog2(tbl->it_level_size) + 3);
-   if (!tmp2)
-   return NULL;
+   spin_lock(&tbl->it_lock);
+   if (tmp[n] == 0) {
+   __be64 *tmp2;
 
-   tmp[n] = cpu_to_be64(__pa(tmp2) |
-   TCE_PCI_READ | TCE_PCI_WRITE);
+   tmp2 = pnv_alloc_tce_level(tbl->it_nid,
+   ilog2(tbl->it_level_size) + 3);
+   if (tmp2)
+   tmp[n] = cpu_to_be64(__pa(tmp2) |
+   TCE_PCI_READ | TCE_PCI_WRITE);
+   }
+   spin_unlock(&tbl->it_lock);
+   if (tmp[n] == 0)
+   return NULL;
}
tce = be64_to_cpu(tmp[n]);
 
-- 
2.17.1



[PATCH RESEND] Powerpc/Watchpoint: Restore nvgprs while returning from exception

2019-06-10 Thread Ravi Bangoria
Powerpc hw triggers watchpoint before executing the instruction. To
make trigger-after-execute behavior, kernel emulates the instruction.
If the instruction is 'load something into non-volatile register',
exception handler should restore emulated register state while
returning back, otherwise there will be register state corruption.
Ex, Adding a watchpoint on a list can corrput the list:

  # cat /proc/kallsyms | grep kthread_create_list
  c121c8b8 d kthread_create_list

Add watchpoint on kthread_create_list->prev:

  # perf record -e mem:0xc121c8c0

Run some workload such that new kthread gets invoked. Ex, I just
logged out from console:

  list_add corruption. next->prev should be prev (c1214e00), \
but was c121c8b8. (next=c121c8b8).
  WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
  CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
  ...
  NIP __list_add_valid+0xb4/0xc0
  LR __list_add_valid+0xb0/0xc0
  Call Trace:
  __list_add_valid+0xb0/0xc0 (unreliable)
  __kthread_create_on_node+0xe0/0x260
  kthread_create_on_node+0x34/0x50
  create_worker+0xe8/0x260
  worker_thread+0x444/0x560
  kthread+0x160/0x1a0
  ret_from_kernel_thread+0x5c/0x70

List corruption happened because it uses 'load into non-volatile
register' instruction:

Snippet from __kthread_create_on_node:

  c0136be8: addis   r29,r2,-19
  c0136bec: ld  r29,31424(r29)
if (!__list_add_valid(new, prev, next))
  c0136bf0: mr  r3,r30
  c0136bf4: mr  r5,r28
  c0136bf8: mr  r4,r29
  c0136bfc: bl  c059a2f8 <__list_add_valid+0x8>

Register state from WARN_ON():

  GPR00: c059a3a0 c07ff23afb50 c1344e00 0075
  GPR04:   001852af8bc1 
  GPR08: 0001 0007 0006 04aa
  GPR12:  c07eb080 c0137038 c05ff62aaa00
  GPR16:   c07fffbe7600 c07fffbe7370
  GPR20: c07fffbe7320 c07fffbe7300 c1373a00 
  GPR24: fef7 c012e320 c07ff23afcb0 c0cb8628
  GPR28: c121c8b8 c1214e00 c07fef5b17e8 c07fef5b17c0

Watchpoint hit at 0xc0136bec.

  addis   r29,r2,-19
   => r29 = 0xc1344e00 + (-19 << 16)
   => r29 = 0xc1214e00

  ld  r29,31424(r29)
   => r29 = *(0xc1214e00 + 31424)
   => r29 = *(0xc121c8c0)

0xc121c8c0 is where we placed a watchpoint and thus this
instruction was emulated by emulate_step. But because handle_dabr_fault
did not restore emulated register state, r29 still contains stale
value in above register state.

Fixes: 5aae8a5370802 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 
64-bit server processors") 
Signed-off-by: Ravi Bangoria 
Cc: sta...@vger.kernel.org # 2.6.36+
Reviewed-by: Naveen N. Rao 
---
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 6b86055e5251..0e649d980ec3 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1761,7 +1761,7 @@ handle_dabr_fault:
ld  r5,_DSISR(r1)
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_break
-12:b   ret_from_except_lite
+12:b   ret_from_except
 
 
 #ifdef CONFIG_PPC_BOOK3S_64
-- 
2.20.1



Re: [PATCH kernel] powerpc/powernv/ioda: Fix race in TCE level allocation

2019-06-10 Thread Alexey Kardashevskiy
Please ignore this (causes lockdep warnings), v2 is coming.


On 11/06/2019 12:31, Alexey Kardashevskiy wrote:
> pnv_tce() returns a pointer to a TCE entry and originally a TCE table
> would be pre-allocated. For the default case of 2GB window the table
> needs only a single level and that is fine. However if more levels are
> requested, it is possible to get a race when 2 threads want a pointer
> to a TCE entry from the same page of TCEs.
> 
> This adds a spinlock to handle the race. The alloc==true case is not
> possible in the real mode so spinlock is safe for KVM as well.
> 
> CC: sta...@vger.kernel.org # v4.19+
> Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on 
> demand")
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> This fixes EEH's from
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810
> 
> 
> ---
>  arch/powerpc/include/asm/iommu.h  |  1 +
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 21 ---
>  2 files changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 2c1845e5e851..1825b4cc0097 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -111,6 +111,7 @@ struct iommu_table {
>   struct iommu_table_ops *it_ops;
>   struct krefit_kref;
>   int it_nid;
> + spinlock_t it_lock;
>  };
>  
>  #define IOMMU_TABLE_USERSPACE_ENTRY_RO(tbl, entry) \
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
> b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> index e28f03e1eb5e..9a19d61e2b12 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
> @@ -29,6 +29,7 @@ void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
>   tbl->it_size = tce_size >> 3;
>   tbl->it_busno = 0;
>   tbl->it_type = TCE_PCI;
> + spin_lock_init(&tbl->it_lock);
>  }
>  
>  static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
> @@ -60,18 +61,22 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool 
> user, long idx, bool alloc)
>   unsigned long tce;
>  
>   if (tmp[n] == 0) {
> - __be64 *tmp2;
> -
>   if (!alloc)
>   return NULL;
>  
> - tmp2 = pnv_alloc_tce_level(tbl->it_nid,
> - ilog2(tbl->it_level_size) + 3);
> - if (!tmp2)
> - return NULL;
> + spin_lock(&tbl->it_lock);
> + if (tmp[n] == 0) {
> + __be64 *tmp2;
>  
> - tmp[n] = cpu_to_be64(__pa(tmp2) |
> - TCE_PCI_READ | TCE_PCI_WRITE);
> + tmp2 = pnv_alloc_tce_level(tbl->it_nid,
> + ilog2(tbl->it_level_size) + 3);
> + if (tmp2)
> + tmp[n] = cpu_to_be64(__pa(tmp2) |
> + TCE_PCI_READ | TCE_PCI_WRITE);
> + }
> + spin_unlock(&tbl->it_lock);
> + if (tmp[n] == 0)
> + return NULL;
>   }
>   tce = be64_to_cpu(tmp[n]);
>  
> 

-- 
Alexey


Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-10 Thread Christophe Leroy




Le 10/06/2019 à 04:39, Anshuman Khandual a écrit :



On 06/07/2019 09:01 PM, Christophe Leroy wrote:



Le 07/06/2019 à 12:34, Anshuman Khandual a écrit :

Very similar definitions for notify_page_fault() are being used by multiple
architectures duplicating much of the same code. This attempts to unify all
of them into a generic implementation, rename it as kprobe_page_fault() and
then move it to a common header.

kprobes_built_in() can detect CONFIG_KPROBES, hence new kprobe_page_fault()
need not be wrapped again within CONFIG_KPROBES. Trap number argument can
now contain upto an 'unsigned int' accommodating all possible platforms.

kprobe_page_fault() goes the x86 way while dealing with preemption context.
As explained in these following commits the invoking context in itself must
be non-preemptible for kprobes processing context irrespective of whether
kprobe_running() or perhaps smp_processor_id() is safe or not. It does not
make much sense to continue when original context is preemptible. Instead
just bail out earlier.

commit a980c0ef9f6d
("x86/kprobes: Refactor kprobes_fault() like kprobe_exceptions_notify()")

commit b506a9d08bae ("x86: code clarification patch to Kprobes arch code")

Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: x...@kernel.org
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Matthew Wilcox 
Cc: Mark Rutland 
Cc: Christophe Leroy 
Cc: Stephen Rothwell 
Cc: Andrey Konovalov 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Yoshinori Sato 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 

Signed-off-by: Anshuman Khandual 
---
Testing:

- Build and boot tested on arm64 and x86
- Build tested on some other archs (arm, sparc64, alpha, powerpc etc)

Changes in RFC V3:

- Updated the commit message with an explaination for new preemption behaviour
- Moved notify_page_fault() to kprobes.h with 'static nokprobe_inline' per 
Matthew
- Changed notify_page_fault() return type from int to bool per Michael Ellerman
- Renamed notify_page_fault() as kprobe_page_fault() per Peterz

Changes in RFC V2: (https://patchwork.kernel.org/patch/10974221/)

- Changed generic notify_page_fault() per Mathew Wilcox
- Changed x86 to use new generic notify_page_fault()
- s/must not/need not/ in commit message per Matthew Wilcox

Changes in RFC V1: (https://patchwork.kernel.org/patch/10968273/)

   arch/arm/mm/fault.c  | 24 +---
   arch/arm64/mm/fault.c    | 24 +---
   arch/ia64/mm/fault.c | 24 +---
   arch/powerpc/mm/fault.c  | 23 ++-
   arch/s390/mm/fault.c | 16 +---
   arch/sh/mm/fault.c   | 18 ++
   arch/sparc/mm/fault_64.c | 16 +---
   arch/x86/mm/fault.c  | 21 ++---
   include/linux/kprobes.h  | 16 
   9 files changed, 27 insertions(+), 155 deletions(-)



[...]


diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 443d980..064dd15 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -458,4 +458,20 @@ static inline bool is_kprobe_optinsn_slot(unsigned long 
addr)
   }
   #endif
   +static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
+  unsigned int trap)
+{
+    int ret = 0;


ret is pointless.


+
+    /*
+ * To be potentially processing a kprobe fault and to be allowed
+ * to call kprobe_running(), we have to be non-preemptible.
+ */
+    if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
+    if (kprobe_running() && kprobe_fault_handler(regs, trap))


don't need an 'if A if B', can do 'if A && B'


Which will make it a very lengthy condition check.


Yes. But is that a problem at all ?

For me the following would be easier to read.

if (kprobes_built_in() && !preemptible() && !user_mode(regs) &&
kprobe_running() && kprobe_fault_handler(regs, trap))
ret = 1;

Christophe






+    ret = 1;


can do 'return true;' directly here


+    }
+    return ret;


And 'return false' here.


Makes sense, will drop ret.



[PATCH kernel v2] powerpc/powernv/ioda: Fix race in TCE level allocation

2019-06-10 Thread Alexey Kardashevskiy
pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.

This adds cmpxchg to handle the race. Note that once a TCE is non-zero,
it cannot become zero again.

CC: sta...@vger.kernel.org # v4.19+
Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on 
demand")
Signed-off-by: Alexey Kardashevskiy 
---

The race occurs about 30 times in the first 3 minutes of copying files
via rsync and that's about it.

This fixes EEH's from
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810

---
Changes:
v2:
* replaced spin_lock with cmpxchg+readonce
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5e..8d6569590161 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -48,6 +48,9 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int 
shift)
return addr;
 }
 
+static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+   unsigned long size, unsigned int levels);
+
 static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool 
alloc)
 {
__be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
@@ -57,9 +60,9 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, 
long idx, bool alloc)
 
while (level) {
int n = (idx & mask) >> (level * shift);
-   unsigned long tce;
+   unsigned long oldtce, tce = be64_to_cpu(READ_ONCE(tmp[n]));
 
-   if (tmp[n] == 0) {
+   if (!tce) {
__be64 *tmp2;
 
if (!alloc)
@@ -70,10 +73,15 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, 
long idx, bool alloc)
if (!tmp2)
return NULL;
 
-   tmp[n] = cpu_to_be64(__pa(tmp2) |
-   TCE_PCI_READ | TCE_PCI_WRITE);
+   tce = __pa(tmp2) | TCE_PCI_READ | TCE_PCI_WRITE;
+   oldtce = be64_to_cpu(cmpxchg(&tmp[n], 0,
+   cpu_to_be64(tce)));
+   if (oldtce) {
+   pnv_pci_ioda2_table_do_free_pages(tmp2,
+   ilog2(tbl->it_level_size) + 3, 1);
+   tce = oldtce;
+   }
}
-   tce = be64_to_cpu(tmp[n]);
 
tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE));
idx &= ~mask;
-- 
2.17.1



Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-10 Thread Anshuman Khandual



On 06/10/2019 08:57 PM, Leonardo Bras wrote:
> On Mon, 2019-06-10 at 08:09 +0530, Anshuman Khandual wrote:
 +/*
 + * To be potentially processing a kprobe fault and to be allowed
 + * to call kprobe_running(), we have to be non-preemptible.
 + */
 +if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
 +if (kprobe_running() && kprobe_fault_handler(regs, trap))
>>>
>>> don't need an 'if A if B', can do 'if A && B'
>>
>> Which will make it a very lengthy condition check.
> 
> Well, is there any problem line-breaking the if condition?
> 
> if (A && B && C &&
> D && E )
> 
> Also, if it's used only to decide the return value, maybe would be fine
> to do somethink like that:
> 
> return (A && B && C &&
> D && E ); 

Got it. But as Dave and Matthew had pointed out earlier, the current x86
implementation has better readability. Hence will probably stick with it.


Re: [RFC V3] mm: Generalize and rename notify_page_fault() as kprobe_page_fault()

2019-06-10 Thread Anshuman Khandual



On 06/11/2019 10:16 AM, Christophe Leroy wrote:
> 
> 
> Le 10/06/2019 à 04:39, Anshuman Khandual a écrit :
>>
>>
>> On 06/07/2019 09:01 PM, Christophe Leroy wrote:
>>>
>>>
>>> Le 07/06/2019 à 12:34, Anshuman Khandual a écrit :
 Very similar definitions for notify_page_fault() are being used by multiple
 architectures duplicating much of the same code. This attempts to unify all
 of them into a generic implementation, rename it as kprobe_page_fault() and
 then move it to a common header.

 kprobes_built_in() can detect CONFIG_KPROBES, hence new kprobe_page_fault()
 need not be wrapped again within CONFIG_KPROBES. Trap number argument can
 now contain upto an 'unsigned int' accommodating all possible platforms.

 kprobe_page_fault() goes the x86 way while dealing with preemption context.
 As explained in these following commits the invoking context in itself must
 be non-preemptible for kprobes processing context irrespective of whether
 kprobe_running() or perhaps smp_processor_id() is safe or not. It does not
 make much sense to continue when original context is preemptible. Instead
 just bail out earlier.

 commit a980c0ef9f6d
 ("x86/kprobes: Refactor kprobes_fault() like kprobe_exceptions_notify()")

 commit b506a9d08bae ("x86: code clarification patch to Kprobes arch code")

 Cc: linux-arm-ker...@lists.infradead.org
 Cc: linux-i...@vger.kernel.org
 Cc: linuxppc-dev@lists.ozlabs.org
 Cc: linux-s...@vger.kernel.org
 Cc: linux...@vger.kernel.org
 Cc: sparcli...@vger.kernel.org
 Cc: x...@kernel.org
 Cc: Andrew Morton 
 Cc: Michal Hocko 
 Cc: Matthew Wilcox 
 Cc: Mark Rutland 
 Cc: Christophe Leroy 
 Cc: Stephen Rothwell 
 Cc: Andrey Konovalov 
 Cc: Michael Ellerman 
 Cc: Paul Mackerras 
 Cc: Russell King 
 Cc: Catalin Marinas 
 Cc: Will Deacon 
 Cc: Tony Luck 
 Cc: Fenghua Yu 
 Cc: Martin Schwidefsky 
 Cc: Heiko Carstens 
 Cc: Yoshinori Sato 
 Cc: "David S. Miller" 
 Cc: Thomas Gleixner 
 Cc: Peter Zijlstra 
 Cc: Ingo Molnar 
 Cc: Andy Lutomirski 
 Cc: Dave Hansen 

 Signed-off-by: Anshuman Khandual 
 ---
 Testing:

 - Build and boot tested on arm64 and x86
 - Build tested on some other archs (arm, sparc64, alpha, powerpc etc)

 Changes in RFC V3:

 - Updated the commit message with an explaination for new preemption 
 behaviour
 - Moved notify_page_fault() to kprobes.h with 'static nokprobe_inline' per 
 Matthew
 - Changed notify_page_fault() return type from int to bool per Michael 
 Ellerman
 - Renamed notify_page_fault() as kprobe_page_fault() per Peterz

 Changes in RFC V2: (https://patchwork.kernel.org/patch/10974221/)

 - Changed generic notify_page_fault() per Mathew Wilcox
 - Changed x86 to use new generic notify_page_fault()
 - s/must not/need not/ in commit message per Matthew Wilcox

 Changes in RFC V1: (https://patchwork.kernel.org/patch/10968273/)

    arch/arm/mm/fault.c  | 24 +---
    arch/arm64/mm/fault.c    | 24 +---
    arch/ia64/mm/fault.c | 24 +---
    arch/powerpc/mm/fault.c  | 23 ++-
    arch/s390/mm/fault.c | 16 +---
    arch/sh/mm/fault.c   | 18 ++
    arch/sparc/mm/fault_64.c | 16 +---
    arch/x86/mm/fault.c  | 21 ++---
    include/linux/kprobes.h  | 16 
    9 files changed, 27 insertions(+), 155 deletions(-)

>>>
>>> [...]
>>>
 diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
 index 443d980..064dd15 100644
 --- a/include/linux/kprobes.h
 +++ b/include/linux/kprobes.h
 @@ -458,4 +458,20 @@ static inline bool is_kprobe_optinsn_slot(unsigned 
 long addr)
    }
    #endif
    +static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
 +  unsigned int trap)
 +{
 +    int ret = 0;
>>>
>>> ret is pointless.
>>>
 +
 +    /*
 + * To be potentially processing a kprobe fault and to be allowed
 + * to call kprobe_running(), we have to be non-preemptible.
 + */
 +    if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
 +    if (kprobe_running() && kprobe_fault_handler(regs, trap))
>>>
>>> don't need an 'if A if B', can do 'if A && B'
>>
>> Which will make it a very lengthy condition check.
> 
> Yes. But is that a problem at all ?

Probably not.

> 
> For me the following would be easier to read.
> 
> if (kprobes_built_in() && !preemptible() && !user_mode(regs) &&
>     kprobe_running() && kprobe_fault_handler(regs, trap))
> ret = 1;

As mentioned before will stick with current x86 implementation. 


Re: [PATCH v3 3/3] powerpc: Add support to initialize ima policy rules

2019-06-10 Thread Satheesh Rajendran
On Mon, Jun 10, 2019 at 04:33:57PM -0400, Nayna Jain wrote:
> PowerNV secure boot relies on the kernel IMA security subsystem to
> perform the OS kernel image signature verification. Since each secure
> boot mode has different IMA policy requirements, dynamic definition of
> the policy rules based on the runtime secure boot mode of the system is
> required. On systems that support secure boot, but have it disabled,
> only measurement policy rules of the kernel image and modules are
> defined.
> 
> This patch defines the arch-specific implementation to retrieve the
> secure boot mode of the system and accordingly configures the IMA policy
> rules.
> 
> This patch provides arch-specific IMA policies if PPC_SECURE_BOOT
> config is enabled.
> 
> Signed-off-by: Nayna Jain 
> ---
>  arch/powerpc/Kconfig   | 14 +
>  arch/powerpc/kernel/Makefile   |  1 +
>  arch/powerpc/kernel/ima_arch.c | 54 ++
>  include/linux/ima.h|  3 +-
>  4 files changed, 71 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/kernel/ima_arch.c

Hi,

This series failed to build against linuxppc/merge tree with 
`ppc64le_defconfig`,

arch/powerpc/platforms/powernv/secboot.c:14:6: error: redefinition of 
'get_powerpc_sb_mode'
   14 | bool get_powerpc_sb_mode(void)
  |  ^~~
In file included from arch/powerpc/platforms/powernv/secboot.c:11:
./arch/powerpc/include/asm/secboot.h:15:20: note: previous definition of 
'get_powerpc_sb_mode' was here
   15 | static inline bool get_powerpc_sb_mode(void)
  |^~~
make[3]: *** [scripts/Makefile.build:278: 
arch/powerpc/platforms/powernv/secboot.o] Error 1
make[3]: *** Waiting for unfinished jobs
make[2]: *** [scripts/Makefile.build:489: arch/powerpc/platforms/powernv] Error 
2
make[1]: *** [scripts/Makefile.build:489: arch/powerpc/platforms] Error 2
make: *** [Makefile:1071: arch/powerpc] Error 2
make: *** Waiting for unfinished jobs

Regards,
-Satheesh

> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 8c1c636308c8..9de77bb14f54 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -902,6 +902,20 @@ config PPC_MEM_KEYS
> 
> If unsure, say y.
> 
> +config PPC_SECURE_BOOT
> + prompt "Enable PowerPC Secure Boot"
> + bool
> + default n
> + depends on PPC64
> + depends on OPAL_SECVAR
> + depends on IMA
> + depends on IMA_ARCH_POLICY
> + help
> +   Linux on POWER with firmware secure boot enabled needs to define
> +   security policies to extend secure boot to the OS.This config
> +   allows user to enable OS Secure Boot on PowerPC systems that
> +   have firmware secure boot support.
> +
>  endmenu
> 
>  config ISA_DMA_API
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 0ea6c4aa3a20..75c929b41341 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -131,6 +131,7 @@ ifdef CONFIG_IMA
>  obj-y+= ima_kexec.o
>  endif
>  endif
> +obj-$(CONFIG_PPC_SECURE_BOOT)+= ima_arch.o
> 
>  obj-$(CONFIG_AUDIT)  += audit.o
>  obj64-$(CONFIG_AUDIT)+= compat_audit.o
> diff --git a/arch/powerpc/kernel/ima_arch.c b/arch/powerpc/kernel/ima_arch.c
> new file mode 100644
> index ..1767bf6e6550
> --- /dev/null
> +++ b/arch/powerpc/kernel/ima_arch.c
> @@ -0,0 +1,54 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 IBM Corporation
> + * Author: Nayna Jain 
> + *
> + * ima_arch.c
> + *  - initialize ima policies for PowerPC Secure Boot
> + */
> +
> +#include 
> +#include 
> +
> +bool arch_ima_get_secureboot(void)
> +{
> + bool sb_mode;
> +
> + sb_mode = get_powerpc_sb_mode();
> + if (sb_mode)
> + return true;
> + else
> + return false;
> +}
> +
> +/*
> + * File signature verification is not needed, include only measurements
> + */
> +static const char *const default_arch_rules[] = {
> + "measure func=KEXEC_KERNEL_CHECK template=ima-modsig",
> + "measure func=MODULE_CHECK template=ima-modsig",
> + NULL
> +};
> +
> +/* Both file signature verification and measurements are needed */
> +static const char *const sb_arch_rules[] = {
> + "measure func=KEXEC_KERNEL_CHECK template=ima-modsig",
> + "measure func=MODULE_CHECK template=ima-modsig",
> + "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig 
> template=ima-modsig",
> +#if !IS_ENABLED(CONFIG_MODULE_SIG)
> + "appraise func=MODULE_CHECK appraise_type=imasig|modsig 
> template=ima-modsig",
> +#endif
> + NULL
> +};
> +
> +/*
> + * On PowerPC, file measurements are to be added to the IMA measurement list
> + * irrespective of the secure boot state of the system. Signature 
> verification
> + * is conditionally enabled based on the secure boot state.
> + */
> +const char *const *arch_get_ima_policy(void)
> +{
> + 

Re: [PATCH 1/4] mm: Move ioremap page table mapping function to mm/

2019-06-10 Thread Christophe Leroy




Le 10/06/2019 à 06:38, Nicholas Piggin a écrit :

ioremap_page_range is a generic function to create a kernel virtual
mapping, move it to mm/vmalloc.c and rename it vmap_range.

For clarity with this move, also:
- Rename vunmap_page_range (vmap_range's inverse) to vunmap_range.
- Rename vmap_page_range (which takes a page array) to vmap_pages.


Maybe it would be easier to follow the change if the name change was 
done in another patch than the move.




Signed-off-by: Nicholas Piggin 
---

Fixed up the arm64 compile errors, fixed a few bugs, and tidied
things up a bit more.

Have tested powerpc and x86 but not arm64, would appreciate a review
and test of the arm64 patch if possible.

  include/linux/vmalloc.h |   3 +
  lib/ioremap.c   | 173 +++---
  mm/vmalloc.c| 228 
  3 files changed, 229 insertions(+), 175 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 51e131245379..812bea5866d6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -147,6 +147,9 @@ extern struct vm_struct *find_vm_area(const void *addr);
  extern int map_vm_area(struct vm_struct *area, pgprot_t prot,
struct page **pages);
  #ifdef CONFIG_MMU
+extern int vmap_range(unsigned long addr,
+  unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
+  unsigned int max_page_shift);


Drop extern keyword here.

As checkpatch tells you, 'CHECK:AVOID_EXTERNS: extern prototypes should 
be avoided in .h files'


Christophe


  extern int map_kernel_range_noflush(unsigned long start, unsigned long size,
pgprot_t prot, struct page **pages);
  extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long 
size);
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 063213685563..e13946da8ec3 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -58,165 +58,24 @@ static inline int ioremap_pud_enabled(void) { return 0; }
  static inline int ioremap_pmd_enabled(void) { return 0; }
  #endif/* CONFIG_HAVE_ARCH_HUGE_VMAP */
  
-static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,

-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
-{
-   pte_t *pte;
-   u64 pfn;
-
-   pfn = phys_addr >> PAGE_SHIFT;
-   pte = pte_alloc_kernel(pmd, addr);
-   if (!pte)
-   return -ENOMEM;
-   do {
-   BUG_ON(!pte_none(*pte));
-   set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot));
-   pfn++;
-   } while (pte++, addr += PAGE_SIZE, addr != end);
-   return 0;
-}
-
-static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
-{
-   if (!ioremap_pmd_enabled())
-   return 0;
-
-   if ((end - addr) != PMD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PMD_SIZE))
-   return 0;
-
-   if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
-   return 0;
-
-   return pmd_set_huge(pmd, phys_addr, prot);
-}
-
-static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
-{
-   pmd_t *pmd;
-   unsigned long next;
-
-   pmd = pmd_alloc(&init_mm, pud, addr);
-   if (!pmd)
-   return -ENOMEM;
-   do {
-   next = pmd_addr_end(addr, end);
-
-   if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot))
-   continue;
-
-   if (ioremap_pte_range(pmd, addr, next, phys_addr, prot))
-   return -ENOMEM;
-   } while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
-   return 0;
-}
-
-static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
-{
-   if (!ioremap_pud_enabled())
-   return 0;
-
-   if ((end - addr) != PUD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PUD_SIZE))
-   return 0;
-
-   if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
-   return 0;
-
-   return pud_set_huge(pud, phys_addr, prot);
-}
-
-static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
-{
-   pud_t *pud;
-   unsigned long next;
-
-   pud = pud_alloc(&init_mm, p4d, addr);
-   if (!pud)
-   return -ENOMEM;
-   do {
-   next = pud_addr_end(addr, end);
-
-   if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot))
-   continue;
-
-   if (ioremap_pmd_range(pud, addr, next, phys_addr, prot))
-

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Christophe Leroy




Le 10/06/2019 à 06:38, Nicholas Piggin a écrit :

For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
allocate huge pages and map them


Will this be compatible with Russell's series 
https://patchwork.ozlabs.org/patch/1099857/ for the implementation of 
STRICT_MODULE_RWX ?

I see that apply_to_page_range() have things like BUG_ON(pud_huge(*pud));

Might also be an issue for arm64 as I think Russell's implementation 
comes from there.




This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
(performance is in the noise, under 1% difference, page tables are likely
to be well cached for this workload). Similar numbers are seen on POWER9.

Signed-off-by: Nicholas Piggin 
---
  include/asm-generic/4level-fixup.h |   1 +
  include/asm-generic/5level-fixup.h |   1 +
  include/linux/vmalloc.h|   1 +
  mm/vmalloc.c   | 132 +++--
  4 files changed, 107 insertions(+), 28 deletions(-)

diff --git a/include/asm-generic/4level-fixup.h 
b/include/asm-generic/4level-fixup.h
index e3667c9a33a5..3cc65a4dd093 100644
--- a/include/asm-generic/4level-fixup.h
+++ b/include/asm-generic/4level-fixup.h
@@ -20,6 +20,7 @@
  #define pud_none(pud) 0
  #define pud_bad(pud)  0
  #define pud_present(pud)  1
+#define pud_large(pud) 0
  #define pud_ERROR(pud)do { } while (0)
  #define pud_clear(pud)pgd_clear(pud)
  #define pud_val(pud)  pgd_val(pud)
diff --git a/include/asm-generic/5level-fixup.h 
b/include/asm-generic/5level-fixup.h
index bb6cb347018c..c4377db09a4f 100644
--- a/include/asm-generic/5level-fixup.h
+++ b/include/asm-generic/5level-fixup.h
@@ -22,6 +22,7 @@
  #define p4d_none(p4d) 0
  #define p4d_bad(p4d)  0
  #define p4d_present(p4d)  1
+#define p4d_large(p4d) 0
  #define p4d_ERROR(p4d)do { } while (0)
  #define p4d_clear(p4d)pgd_clear(p4d)
  #define p4d_val(p4d)  pgd_val(p4d)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 812bea5866d6..4c92dc608928 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -42,6 +42,7 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
+   unsigned intpage_shift;
unsigned intnr_pages;
phys_addr_t phys_addr;
const void  *caller;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd27cfb29b10..0cf8e861caeb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -36,6 +36,7 @@
  #include 
  
  #include 

+#include 
  #include 
  #include 
  
@@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, unsigned long end,

return ret;
  }
  
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP

+static int vmap_hpages_range(unsigned long start, unsigned long end,
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
+{
+   unsigned long addr = start;
+   unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
+
+   for (i = 0; i < nr; i++) {
+   int err;
+
+   err = vmap_range_noflush(addr,
+   addr + (PAGE_SIZE << page_shift),
+   __pa(page_address(pages[i])), prot,
+   PAGE_SHIFT + page_shift);
+   if (err)
+   return err;
+
+   addr += PAGE_SIZE << page_shift;
+   }
+   flush_cache_vmap(start, end);
+
+   return nr;
+}
+#else
+static int vmap_hpages_range(unsigned long start, unsigned long end,
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
+{
+   BUG_ON(page_shift != PAGE_SIZE);


Do we really need a BUG_ON() there ? What happens if this condition is 
true ?



+   return vmap_pages_range(start, end, prot, pages);
+}
+#endif
+
+
  int is_vmalloc_or_module_addr(const void *x)
  {
/*
@@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
  {
unsigned long addr = (unsigned long) vmalloc_addr;
struct page *page = NULL;
-   pgd_t *pgd = pgd_offset_k(addr);
+   pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
@@ -474,27 +510,38 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 */
VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
  
+	pgd = pgd_offset_k(addr);

if (pgd_none(*pgd))
return NULL;
+
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
return NULL;
-   pud = pud_offset(p4d, ad

Re: [PATCH v2 3/6] powerpc/eeh: Improve debug messages around device addition

2019-06-10 Thread Alexey Kardashevskiy



On 07/05/2019 14:30, Sam Bobroff wrote:
> Also remove useless comment.
> 
> Signed-off-by: Sam Bobroff 
> Reviewed-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/kernel/eeh.c|  2 +-
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 14 
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 23 +++-
>  3 files changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 8d3c36a1f194..b14d89547895 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1291,7 +1291,7 @@ void eeh_add_device_late(struct pci_dev *dev)
>   pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
>   edev = pdn_to_eeh_dev(pdn);
>   if (edev->pdev == dev) {
> - pr_debug("EEH: Already referenced !\n");
> + pr_debug("EEH: Device %s already referenced!\n", pci_name(dev));
>   return;
>   }
>  
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index 6fc1a463b796..0e374cdba961 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -50,10 +50,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
>   if (!pdev->is_virtfn)
>   return;
>  
> - /*
> -  * The following operations will fail if VF's sysfs files
> -  * aren't created or its resources aren't finalized.
> -  */
> + pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));


dev_dbg() seems more appropriate.


>   eeh_add_device_early(pdn);
>   eeh_add_device_late(pdev);
>   eeh_sysfs_add_device(pdev);
> @@ -397,6 +394,10 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void 
> *data)
>   int ret;
>   int config_addr = (pdn->busno << 8) | (pdn->devfn);
>  
> + pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
> + __func__, hose->global_number, pdn->busno,
> + PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
> +
>   /*
>* When probing the root bridge, which doesn't have any
>* subordinate PCI devices. We don't have OF node for
> @@ -491,6 +492,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void 
> *data)
>   /* Save memory bars */
>   eeh_save_bars(edev);
>  
> + pr_debug("%s: EEH enabled on %02x:%02x.%01x PHB#%x-PE#%x\n",
> + __func__, pdn->busno, PCI_SLOT(pdn->devfn),
> + PCI_FUNC(pdn->devfn), edev->pe->phb->global_number,
> + edev->pe->addr);
> +
>   return NULL;
>  }
>  
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
> b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index 7aa50258dd42..ae06878fbdea 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -65,6 +65,8 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
>   if (!pdev->is_virtfn)
>   return;
>  
> + pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
> +
>   pdn->device_id  =  pdev->device;
>   pdn->vendor_id  =  pdev->vendor;
>   pdn->class_code =  pdev->class;
> @@ -251,6 +253,10 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
> *data)
>   int enable = 0;
>   int ret;
>  
> + pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
> + __func__, pdn->phb->global_number, pdn->busno,
> + PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
> +
>   /* Retrieve OF node and eeh device */
>   edev = pdn_to_eeh_dev(pdn);
>   if (!edev || edev->pe)
> @@ -294,7 +300,12 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
> *data)
>  
>   /* Enable EEH on the device */
>   ret = eeh_ops->set_option(&pe, EEH_OPT_ENABLE);
> - if (!ret) {
> + if (ret) {
> + pr_debug("%s: EEH failed to enable on %02x:%02x.%01x 
> PHB#%x-PE#%x (code %d)\n",
> + __func__, pdn->busno, PCI_SLOT(pdn->devfn),
> + PCI_FUNC(pdn->devfn), pe.phb->global_number,
> + pe.addr, ret);
> + } else {


edev!=NULL here so you could do dev_dbg(&edev->pdev->dev,...) and skip
PCI_SLOT/PCI_FUNC. Or is (edev!=NULL && edev->pdev==NULL) possible (it
could be, just asking)?


>   /* Retrieve PE address */
>   edev->pe_config_addr = eeh_ops->get_pe_addr(&pe);
>   pe.addr = edev->pe_config_addr;
> @@ -310,11 +321,6 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void 
> *data)
>   if (enable) {
>   eeh_add_flag(EEH_ENABLED);
>   eeh_add_to_parent_pe(edev);
> -
> - pr_debug("%s: EEH enabled on %02x:%02x.%01x 
> PHB#%x-PE#%x\n",
> - __func__, pdn->busno, PCI_SLOT(pdn->devfn),
> - PCI_FUNC(pdn->devfn), pe.phb->global_number,
> - pe.addr);
> 

Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-10 Thread Benjamin Herrenschmidt
On Mon, 2019-06-10 at 13:44 -0500, Larry Finger wrote:
> On 6/7/19 11:21 PM, Benjamin Herrenschmidt wrote:
> > 
> > > Please try the attached patch. I'm not really pleased with it and I will
> > > continue to determine why the fallback to a 30-bit mask fails, but at 
> > > least this
> > > one works for me.
> > 
> > Your patch only makes sense if the device is indeed capable of
> > addressing 31-bits.
> > 
> > So either the driver is buggy and asks for a too small mask in which
> > case your patch is ok, or it's not and you're just going to cause all
> > sort of interesting random problems including possible memory
> > corruption.
> 
> Of course the driver may be buggy, but it asks for the correct mask.
> 
> This particular device is not capable of handling 32-bit DMA. The driver 
> detects 
> the 32-bit failure and falls back to 30 bits. It works on x86, and did on 
> PPC32 
> until 5.1. As Christoph said, it should always be possible to use fewer bits 
> than the maximum.

No, I don't think it *worked* on ppc32 before Christoph patch. I think
it "mostly sort-of worked" :-)

The reason I'm saying that is if your system has more than 1GB of RAM,
then you'll have chunks of memory that the device simply cannot
address.

Before Christoph patches, we had no ZONE_DMA or ZONE_DMA32 covering the
30-bit limited space, so any memory allocation could in theory land
above 30-bits, causing all sort of horrible things to happen with that
driver.

The reason I think it sort-of-mostly-worked is that to get more than
1GB of RAM, those machines use CONFIG_HIGHMEM. And *most* network
buffers aren't allocated in Highmem so you got lucky.

That said, there is such as thing as no-copy send on network, so I
wouldn't be surprised if some things would still have failed, just not
frequent enough for you to notice.

> Similar devices that are new enough to use b43 rather than b43legacy work 
> with 
> new kernels; however, they have and use 32-bit DMA.

Cheres,
Ben.




Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-10 Thread Christoph Hellwig
On Mon, Jun 10, 2019 at 11:09:47AM -0500, Larry Finger wrote:
>>>  return -EIO;
>>>
>>> For b43legacy, dev->dma_mask is 0xc2656848.
>>>  dma_supported(dev, mask) is 0xc08b, mask is 0x3fff, and
>>> the routine returns -EIO.
>>>
>>> For b43,   dev->dma_mask is 0xc26568480001,
>>>  dma_supported(dev, mask) is 0xc08b, mask is 0x, and
>>> the routine returns 0.
>>
>> I don't fully understand what values the above map to.  Can you send
>> me your actual debugging patch as well?
>
> I do not understand why the if statement returns true as neither of the 
> values is zero. After seeing the x86 output shown below, I also do not 
> understand all the trailing zeros.
>
> My entire patch is attached. That output came from this section:

What might be confusing in your output is that dev->dma_mask is a pointer,
and we are setting it in dma_set_mask.  That is before we only check
if the pointer is set, and later we override it.  Of course this doesn't
actually explain the failure.  But what is even more strange to me
is that you get a return value from dma_supported() that isn't 0 or 1,
as that function is supposed to return a boolean, and I really can't see
how mask >= __phys_to_dma(dev, min_mask), would return anything but 0
or 1.  Does the output change if you use the correct printk specifiers?

i.e. with a debug patch like this:


diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 2c2772e9702a..9e5b30b12b10 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -378,6 +378,7 @@ EXPORT_SYMBOL(dma_direct_map_resource);
 int dma_direct_supported(struct device *dev, u64 mask)
 {
u64 min_mask;
+   bool ret;
 
if (IS_ENABLED(CONFIG_ZONE_DMA))
min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
@@ -391,7 +392,12 @@ int dma_direct_supported(struct device *dev, u64 mask)
 * use __phys_to_dma() here so that the SME encryption mask isn't
 * part of the check.
 */
-   return mask >= __phys_to_dma(dev, min_mask);
+   ret = (mask >= __phys_to_dma(dev, min_mask));
+   if (!ret)
+   dev_info(dev,
+   "%s: failed (mask = 0x%llx, min_mask = 0x%llx/0x%llx, 
dma bits = %d\n",
+   __func__, mask, min_mask, __phys_to_dma(dev, min_mask), 
ARCH_ZONE_DMA_BITS);
+   return ret;
 }
 
 size_t dma_direct_max_mapping_size(struct device *dev)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index f7afdadb6770..6c57ccdee2ae 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -317,8 +317,14 @@ void arch_dma_set_mask(struct device *dev, u64 mask);
 
 int dma_set_mask(struct device *dev, u64 mask)
 {
-   if (!dev->dma_mask || !dma_supported(dev, mask))
+   if (!dev->dma_mask) {
+   dev_info(dev, "no DMA mask set!\n");
return -EIO;
+   }
+   if (!dma_supported(dev, mask)) {
+   printk("DMA not supported\n");
+   return -EIO;
+   }
 
arch_dma_set_mask(dev, mask);
dma_check_mask(dev, mask);


Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-10 Thread Christoph Hellwig
On Tue, Jun 11, 2019 at 03:56:33PM +1000, Benjamin Herrenschmidt wrote:
> The reason I think it sort-of-mostly-worked is that to get more than
> 1GB of RAM, those machines use CONFIG_HIGHMEM. And *most* network
> buffers aren't allocated in Highmem so you got lucky.
> 
> That said, there is such as thing as no-copy send on network, so I
> wouldn't be surprised if some things would still have failed, just not
> frequent enough for you to notice.

Unless NETIF_F_HIGHDMA is set on a netdev, the core networkign code
will bounce buffer highmem pages for the driver under all circumstances.


Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Anshuman Khandual



On 06/10/2019 08:14 PM, Nicholas Piggin wrote:
> Mark Rutland's on June 11, 2019 12:10 am:
>> Hi,
>>
>> On Mon, Jun 10, 2019 at 02:38:38PM +1000, Nicholas Piggin wrote:
>>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>>> allocate huge pages and map them
>>>
>>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>>> (performance is in the noise, under 1% difference, page tables are likely
>>> to be well cached for this workload). Similar numbers are seen on POWER9.
>>
>> Do you happen to know which vmalloc mappings these get used for in the
>> above case? Where do we see vmalloc mappings that large?
> 
> Large module vmalloc could be subject to huge mappings.
> 
>> I'm worried as to how this would interact with the set_memory_*()
>> functions, as on arm64 those can only operate on page-granular mappings.
>> Those may need fixing up to handle huge mappings; certainly if the above
>> is all for modules.
> 
> Good point, that looks like it would break on arm64 at least. I'll
> work on it. We may have to make this opt in beyond HUGE_VMAP.

This is another reason we might need to have an arch opt-ins like the one
I mentioned before.


Re: [PATCH 1/3] powerpc/64: __ioremap_at clean up in the error case

2019-06-10 Thread Christophe Leroy




Le 10/06/2019 à 05:08, Nicholas Piggin a écrit :

__ioremap_at error handling is wonky, it requires caller to clean up
after it. Implement a helper that does the map and error cleanup and
remove the requirement from the caller.

Signed-off-by: Nicholas Piggin 
---

This series is a different approach to the problem, using the generic
ioremap_page_range directly which reduces added code, and moves
the radix specific code into radix files. Thanks to Christophe for
pointing out various problems with the previous patch.

  arch/powerpc/mm/pgtable_64.c | 27 ---
  1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index d2d976ff8a0e..6bd3660388aa 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -108,14 +108,30 @@ unsigned long ioremap_bot;
  unsigned long ioremap_bot = IOREMAP_BASE;
  #endif
  
+static int ioremap_range(unsigned long ea, phys_addr_t pa, unsigned long size, pgprot_t prot, int nid)

+{
+   unsigned long i;
+
+   for (i = 0; i < size; i += PAGE_SIZE) {
+   int err = map_kernel_page(ea + i, pa + i, prot);


Missing a blank line


+   if (err) {


I'd have done the following to reduce indentation depth

if (!err)
continue


+   if (slab_is_available())
+   unmap_kernel_range(ea, size);


Shouldn't it be unmap_kernel_range(ea, i) ?

Christophe


+   else
+   WARN_ON_ONCE(1); /* Should clean up */
+   return err;
+   }
+   }
+
+   return 0;
+}
+
  /**
   * __ioremap_at - Low level function to establish the page tables
   *for an IO mapping
   */
  void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, 
pgprot_t prot)
  {
-   unsigned long i;
-
/* We don't support the 4K PFN hack with ioremap */
if (pgprot_val(prot) & H_PAGE_4K_PFN)
return NULL;
@@ -129,9 +145,8 @@ void __iomem *__ioremap_at(phys_addr_t pa, void *ea, 
unsigned long size, pgprot_
WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
WARN_ON(size & ~PAGE_MASK);
  
-	for (i = 0; i < size; i += PAGE_SIZE)

-   if (map_kernel_page((unsigned long)ea + i, pa + i, prot))
-   return NULL;
+   if (ioremap_range((unsigned long)ea, pa, size, prot, NUMA_NO_NODE))
+   return NULL;
  
  	return (void __iomem *)ea;

  }
@@ -182,8 +197,6 @@ void __iomem * __ioremap_caller(phys_addr_t addr, unsigned 
long size,
  
  		area->phys_addr = paligned;

ret = __ioremap_at(paligned, area->addr, size, prot);
-   if (!ret)
-   vunmap(area->addr);
} else {
ret = __ioremap_at(paligned, (void *)ioremap_bot, size, prot);
if (ret)



[PATCH v11 00/13] Appended signatures support for IMA appraisal

2019-06-10 Thread Thiago Jung Bauermann
Hello,

Nothing big in this version. Noteworthy changes are:

1. Fixes for two bugs in ima_appraise_measurements() which were spotted and
resolved by Mimi Zohar. The changelog points them out.

2. One bugfix in process_measurement() which would cause all files
appraised with modsig to be measured as well, even if the policy didn't
request it.

3. Adapted to work with per policy rule template formats.

Plus small cosmetic changes in some places. The changelog has the details.

This has been tested with signed modules and with signed kernels loaded via
kexec_file_load().

Many thanks to Mimi Zohar for her help with the development of this patch
series.

The patches apply on today's linux-integrity/next-queued-testing.

Original cover letter:

On the OpenPOWER platform, secure boot and trusted boot are being
implemented using IMA for taking measurements and verifying signatures.
Since the kernel image on Power servers is an ELF binary, kernels are
signed using the scripts/sign-file tool and thus use the same signature
format as signed kernel modules.

This patch series adds support in IMA for verifying those signatures.
It adds flexibility to OpenPOWER secure boot, because it allows it to boot
kernels with the signature appended to them as well as kernels where the
signature is stored in the IMA extended attribute.

Changes since v10:

- Patch "MODSIGN: Export module signature definitions"
  - Moved config MODULE_SIG_FORMAT definition before its use. Suggested by
Mimi Zohar.
  - Added missing kerneldoc for @name parameter. Suggested by Mimi Zohar.

- Patch "ima: Implement support for module-style appended signatures"
  - Bugfix: don't check status variable when deciding whether to verify
modsig in ima_appraise_measurement(). Suggested by Mimi Zohar.
  - Bugfix: verify the modsig in ima_appraise_measurement() if the xattr
contains a digest. Suggested by Mimi Zohar.

- Patch "ima: Define ima-modsig template"
  - Renamed ima_modsig_serialize() to ima_get_raw_modsig().
  - Renamed check_current_template_modsig() to check_template_modsig().
  - Fixed outdated comment in ima_eventmodsig_init(). Suggested by Mimi
Zohar.
  - Check either the global or the per-rule template when an appraisal rule
allows modsig. Suggested by Mimi Zohar.

- Patch "ima: Store the measurement again when appraising a modsig"
  - Bugfix: Only re-measure file containing modsig if it was measured
before.
  - Check for modsig-related fields in the template_desc obtained in
process_measurement() which can be a per-rule template. Suggested by Mimi
Zohar.

- Patch "ima: Allow template= option for appraise rules as well"
  - New patch. Suggested by Mimi Zohar.

Changes since v9:

- Patch "MODSIGN: Export module signature definitions"
  - Moved mod_check_sig() to a new file so that CONFIG_IMA_APPRAISE_MODSIG
doesn't have to depend on CONFIG_MODULES.
  - Changed scripts/Makefile to build sign-file if CONFIG_MODULE_SIG_FORMAT
is set.
  - Removed Mimi's Reviewed-by because of the changes in this version.

- Patch "PKCS#7: Refactor verify_pkcs7_signature()"
  - Don't add function pkcs7_get_message_sig() anymore, since it's not
needed in the current version.

- Patch "PKCS#7: Introduce pkcs7_get_digest()"
  - Changed 'len' argument from 'u8 *' to 'u32 *'.
  - Added 'hash_algo' argument to obtain the algo used for the digest.
  - Don't check whether 'buf', 'len' and 'hash_algo' output arguments are NULL,
since the function's only caller always sets them.
  - Removed Mimi's Reviewed-by because of the changes in this version.

- Patch "integrity: Introduce asymmetric_sig_has_known_key()"
  - Dropped.

- Patch "integrity: Introduce integrity_keyring_from_id"
  - Squashed into "ima: Implement support for module-style appended signatures"
  - Changed integrity_keyring_from_id() to a static function (suggested by Mimi
Zohar).

- Patch "ima: Introduce is_signed()"
  - Dropped.

- Patch "ima: Export func_tokens"
  - Squashed into "ima: Implement support for module-style appended signatures"

- Patch "ima: Use designated initializers for struct ima_event_data"
  - New patch.

- Patch "ima: Factor xattr_verify() out of ima_appraise_measurement()"
  - New patch.

- Patch "ima: Implement support for module-style appended signatures"
  - Renamed 'struct modsig_hdr' to 'struct modsig'.
  - Added integrity_modsig_verify() to integrity/digsig.c so that it's not
necessary to export integrity_keyring_from_id() (Suggested by Mimi Zohar).
  - Don't add functions ima_xattr_sig_known_key() and
modsig_has_known_key() since they're not necessary anymore.
  - Added modsig argument to ima_appraise_measurement().
  - Verify modsig in a separate function called by ima_appraise_measurement().
  - Renamed ima_read_collect_modsig() to ima_read_modsig(), with a separate
collect function added in patch "ima: Collect modsig" (suggested by Mimi
Zohar).
  - In ima_read_modsig(), moved code saving of raw PKCS7 data to 'stru

[PATCH v11 01/13] MODSIGN: Export module signature definitions

2019-06-10 Thread Thiago Jung Bauermann
IMA will use the module_signature format for append signatures, so export
the relevant definitions and factor out the code which verifies that the
appended signature trailer is valid.

Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it
and be able to use mod_check_sig() without having to depend on either
CONFIG_MODULE_SIG or CONFIG_MODULES.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
Cc: Jessica Yu 
---
 include/linux/module.h   |  3 --
 include/linux/module_signature.h | 44 +
 init/Kconfig |  6 +++-
 kernel/Makefile  |  1 +
 kernel/module.c  |  1 +
 kernel/module_signature.c| 46 ++
 kernel/module_signing.c  | 56 +---
 scripts/Makefile |  2 +-
 8 files changed, 106 insertions(+), 53 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 188998d3dca9..aa56f531cf1e 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -25,9 +25,6 @@
 #include 
 #include 
 
-/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
-#define MODULE_SIG_STRING "~Module signature appended~\n"
-
 /* Not Yet Implemented */
 #define MODULE_SUPPORTED_DEVICE(name)
 
diff --git a/include/linux/module_signature.h b/include/linux/module_signature.h
new file mode 100644
index ..523617fc5b6a
--- /dev/null
+++ b/include/linux/module_signature.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Module signature handling.
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#ifndef _LINUX_MODULE_SIGNATURE_H
+#define _LINUX_MODULE_SIGNATURE_H
+
+/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
+#define MODULE_SIG_STRING "~Module signature appended~\n"
+
+enum pkey_id_type {
+   PKEY_ID_PGP,/* OpenPGP generated key ID */
+   PKEY_ID_X509,   /* X.509 arbitrary subjectKeyIdentifier */
+   PKEY_ID_PKCS7,  /* Signature in PKCS#7 message */
+};
+
+/*
+ * Module signature information block.
+ *
+ * The constituents of the signature section are, in order:
+ *
+ * - Signer's name
+ * - Key identifier
+ * - Signature data
+ * - Information block
+ */
+struct module_signature {
+   u8  algo;   /* Public-key crypto algorithm [0] */
+   u8  hash;   /* Digest algorithm [0] */
+   u8  id_type;/* Key identifier type [PKEY_ID_PKCS7] */
+   u8  signer_len; /* Length of signer's name [0] */
+   u8  key_id_len; /* Length of key identifier [0] */
+   u8  __pad[3];
+   __be32  sig_len;/* Length of signature data */
+};
+
+int mod_check_sig(const struct module_signature *ms, size_t file_len,
+ const char *name);
+
+#endif /* _LINUX_MODULE_SIGNATURE_H */
diff --git a/init/Kconfig b/init/Kconfig
index 8b9ffe236e4f..c2286a3c74c5 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1852,6 +1852,10 @@ config BASE_SMALL
default 0 if BASE_FULL
default 1 if !BASE_FULL
 
+config MODULE_SIG_FORMAT
+   def_bool n
+   select SYSTEM_DATA_VERIFICATION
+
 menuconfig MODULES
bool "Enable loadable module support"
option modules
@@ -1929,7 +1933,7 @@ config MODULE_SRCVERSION_ALL
 config MODULE_SIG
bool "Module signature verification"
depends on MODULES
-   select SYSTEM_DATA_VERIFICATION
+   select MODULE_SIG_FORMAT
help
  Check modules for valid signatures upon load: the signature
  is simply appended to the module. For more information see
diff --git a/kernel/Makefile b/kernel/Makefile
index 33824f0385b3..f29ae2997a43 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -58,6 +58,7 @@ endif
 obj-$(CONFIG_UID16) += uid16.o
 obj-$(CONFIG_MODULES) += module.o
 obj-$(CONFIG_MODULE_SIG) += module_signing.o
+obj-$(CONFIG_MODULE_SIG_FORMAT) += module_signature.o
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_CRASH_CORE) += crash_core.o
diff --git a/kernel/module.c b/kernel/module.c
index 6e6712b3aaf5..2712f4d217f5 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/kernel/module_signature.c b/kernel/module_signature.c
new file mode 100644
index ..4224a1086b7d
--- /dev/null
+++ b/kernel/module_signature.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Module signature checker
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * mod_check_sig - check that the given signature is sane
+ *
+ * @ms:Signature to check.
+ * @file_len:  Size of the file to which @ms is appen

[PATCH v11 02/13] PKCS#7: Refactor verify_pkcs7_signature()

2019-06-10 Thread Thiago Jung Bauermann
IMA will need to verify a PKCS#7 signature which has already been parsed.
For this reason, factor out the code which does that from
verify_pkcs7_signature() into a new function which takes a struct
pkcs7_message instead of a data buffer.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
Cc: David Howells 
Cc: David Woodhouse 
Cc: Herbert Xu 
Cc: "David S. Miller" 
---
 certs/system_keyring.c   | 61 ++--
 include/linux/verification.h | 10 ++
 2 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index c05c29ae4d5d..4ba82e52e4b4 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -194,33 +194,27 @@ late_initcall(load_system_certificate_list);
 #ifdef CONFIG_SYSTEM_DATA_VERIFICATION
 
 /**
- * verify_pkcs7_signature - Verify a PKCS#7-based signature on system data.
+ * verify_pkcs7_message_sig - Verify a PKCS#7-based signature on system data.
  * @data: The data to be verified (NULL if expecting internal data).
  * @len: Size of @data.
- * @raw_pkcs7: The PKCS#7 message that is the signature.
- * @pkcs7_len: The size of @raw_pkcs7.
+ * @pkcs7: The PKCS#7 message that is the signature.
  * @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only,
  * (void *)1UL for all trusted keys).
  * @usage: The use to which the key is being put.
  * @view_content: Callback to gain access to content.
  * @ctx: Context for callback.
  */
-int verify_pkcs7_signature(const void *data, size_t len,
-  const void *raw_pkcs7, size_t pkcs7_len,
-  struct key *trusted_keys,
-  enum key_being_used_for usage,
-  int (*view_content)(void *ctx,
-  const void *data, size_t len,
-  size_t asn1hdrlen),
-  void *ctx)
+int verify_pkcs7_message_sig(const void *data, size_t len,
+struct pkcs7_message *pkcs7,
+struct key *trusted_keys,
+enum key_being_used_for usage,
+int (*view_content)(void *ctx,
+const void *data, size_t len,
+size_t asn1hdrlen),
+void *ctx)
 {
-   struct pkcs7_message *pkcs7;
int ret;
 
-   pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
-   if (IS_ERR(pkcs7))
-   return PTR_ERR(pkcs7);
-
/* The data should be detached - so we need to supply it. */
if (data && pkcs7_supply_detached_data(pkcs7, data, len) < 0) {
pr_err("PKCS#7 signature with non-detached data\n");
@@ -273,6 +267,41 @@ int verify_pkcs7_signature(const void *data, size_t len,
}
 
 error:
+   pr_devel("<==%s() = %d\n", __func__, ret);
+   return ret;
+}
+
+/**
+ * verify_pkcs7_signature - Verify a PKCS#7-based signature on system data.
+ * @data: The data to be verified (NULL if expecting internal data).
+ * @len: Size of @data.
+ * @raw_pkcs7: The PKCS#7 message that is the signature.
+ * @pkcs7_len: The size of @raw_pkcs7.
+ * @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only,
+ * (void *)1UL for all trusted keys).
+ * @usage: The use to which the key is being put.
+ * @view_content: Callback to gain access to content.
+ * @ctx: Context for callback.
+ */
+int verify_pkcs7_signature(const void *data, size_t len,
+  const void *raw_pkcs7, size_t pkcs7_len,
+  struct key *trusted_keys,
+  enum key_being_used_for usage,
+  int (*view_content)(void *ctx,
+  const void *data, size_t len,
+  size_t asn1hdrlen),
+  void *ctx)
+{
+   struct pkcs7_message *pkcs7;
+   int ret;
+
+   pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
+   if (IS_ERR(pkcs7))
+   return PTR_ERR(pkcs7);
+
+   ret = verify_pkcs7_message_sig(data, len, pkcs7, trusted_keys, usage,
+  view_content, ctx);
+
pkcs7_free_message(pkcs7);
pr_devel("<==%s() = %d\n", __func__, ret);
return ret;
diff --git a/include/linux/verification.h b/include/linux/verification.h
index 018fb5f13d44..5e1d41f2b336 100644
--- a/include/linux/verification.h
+++ b/include/linux/verification.h
@@ -36,6 +36,7 @@ extern const char *const 
key_being_used_for[NR__KEY_BEING_USED_FOR];
 #ifdef CONFIG_SYSTEM_DATA_VERIFICATION
 
 struct key;
+struct pkcs7_message;
 
 extern int verify_pkcs7_signature(const void *data, size_t len,
  const void *raw_pkcs7,

[PATCH v11 03/13] PKCS#7: Introduce pkcs7_get_digest()

2019-06-10 Thread Thiago Jung Bauermann
IMA will need to access the digest of the PKCS7 message (as calculated by
the kernel) before the signature is verified, so introduce
pkcs7_get_digest() for that purpose.

Also, modify pkcs7_digest() to detect when the digest was already
calculated so that it doesn't have to do redundant work. Verifying that
sinfo->sig->digest isn't NULL is sufficient because both places which
allocate sinfo->sig (pkcs7_parse_message() and pkcs7_note_signed_info())
use kzalloc() so sig->digest is always initialized to zero.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
Cc: David Howells 
Cc: David Woodhouse 
Cc: Herbert Xu 
Cc: "David S. Miller" 
---
 crypto/asymmetric_keys/pkcs7_verify.c | 33 +++
 include/crypto/pkcs7.h|  4 
 2 files changed, 37 insertions(+)

diff --git a/crypto/asymmetric_keys/pkcs7_verify.c 
b/crypto/asymmetric_keys/pkcs7_verify.c
index f7b0980bf02d..3243981152b5 100644
--- a/crypto/asymmetric_keys/pkcs7_verify.c
+++ b/crypto/asymmetric_keys/pkcs7_verify.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "pkcs7_parser.h"
 
@@ -33,6 +34,10 @@ static int pkcs7_digest(struct pkcs7_message *pkcs7,
 
kenter(",%u,%s", sinfo->index, sinfo->sig->hash_algo);
 
+   /* The digest was calculated already. */
+   if (sig->digest)
+   return 0;
+
if (!sinfo->sig->hash_algo)
return -ENOPKG;
 
@@ -121,6 +126,34 @@ static int pkcs7_digest(struct pkcs7_message *pkcs7,
return ret;
 }
 
+int pkcs7_get_digest(struct pkcs7_message *pkcs7, const u8 **buf, u32 *len,
+enum hash_algo *hash_algo)
+{
+   struct pkcs7_signed_info *sinfo = pkcs7->signed_infos;
+   int i, ret;
+
+   /*
+* This function doesn't support messages with more than one signature.
+*/
+   if (sinfo == NULL || sinfo->next != NULL)
+   return -EBADMSG;
+
+   ret = pkcs7_digest(pkcs7, sinfo);
+   if (ret)
+   return ret;
+
+   *buf = sinfo->sig->digest;
+   *len = sinfo->sig->digest_size;
+
+   for (i = 0; i < HASH_ALGO__LAST; i++)
+   if (!strcmp(hash_algo_name[i], sinfo->sig->hash_algo)) {
+   *hash_algo = i;
+   break;
+   }
+
+   return 0;
+}
+
 /*
  * Find the key (X.509 certificate) to use to verify a PKCS#7 message.  PKCS#7
  * uses the issuer's name and the issuing certificate serial number for
diff --git a/include/crypto/pkcs7.h b/include/crypto/pkcs7.h
index 583f199400a3..3bfe6829eaae 100644
--- a/include/crypto/pkcs7.h
+++ b/include/crypto/pkcs7.h
@@ -13,6 +13,7 @@
 #define _CRYPTO_PKCS7_H
 
 #include 
+#include 
 #include 
 
 struct key;
@@ -44,4 +45,7 @@ extern int pkcs7_verify(struct pkcs7_message *pkcs7,
 extern int pkcs7_supply_detached_data(struct pkcs7_message *pkcs7,
  const void *data, size_t datalen);
 
+extern int pkcs7_get_digest(struct pkcs7_message *pkcs7, const u8 **buf,
+   u32 *len, enum hash_algo *hash_algo);
+
 #endif /* _CRYPTO_PKCS7_H */



[PATCH v11 04/13] integrity: Introduce struct evm_xattr

2019-06-10 Thread Thiago Jung Bauermann
Even though struct evm_ima_xattr_data includes a fixed-size array to hold a
SHA1 digest, most of the code ignores the array and uses the struct to mean
"type indicator followed by data of unspecified size" and tracks the real
size of what the struct represents in a separate length variable.

The only exception to that is the EVM code, which correctly uses the
definition of struct evm_ima_xattr_data.

So make this explicit in the code by removing the length specification from
the array in struct evm_ima_xattr_data. Also, change the name of the
element from digest to data since in most places the array doesn't hold a
digest.

A separate struct evm_xattr is introduced, with the original definition of
evm_ima_xattr_data to be used in the places that actually expect that
definition, specifically the EVM HMAC code.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
---
 security/integrity/evm/evm_main.c | 8 
 security/integrity/ima/ima_appraise.c | 7 ---
 security/integrity/integrity.h| 6 ++
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/security/integrity/evm/evm_main.c 
b/security/integrity/evm/evm_main.c
index b6d9f14bc234..588f22f1b5bd 100644
--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -169,7 +169,7 @@ static enum integrity_status evm_verify_hmac(struct dentry 
*dentry,
/* check value type */
switch (xattr_data->type) {
case EVM_XATTR_HMAC:
-   if (xattr_len != sizeof(struct evm_ima_xattr_data)) {
+   if (xattr_len != sizeof(struct evm_xattr)) {
evm_status = INTEGRITY_FAIL;
goto out;
}
@@ -179,7 +179,7 @@ static enum integrity_status evm_verify_hmac(struct dentry 
*dentry,
   xattr_value_len, &digest);
if (rc)
break;
-   rc = crypto_memneq(xattr_data->digest, digest.digest,
+   rc = crypto_memneq(xattr_data->data, digest.digest,
   SHA1_DIGEST_SIZE);
if (rc)
rc = -EINVAL;
@@ -523,7 +523,7 @@ int evm_inode_init_security(struct inode *inode,
 const struct xattr *lsm_xattr,
 struct xattr *evm_xattr)
 {
-   struct evm_ima_xattr_data *xattr_data;
+   struct evm_xattr *xattr_data;
int rc;
 
if (!evm_key_loaded() || !evm_protected_xattr(lsm_xattr->name))
@@ -533,7 +533,7 @@ int evm_inode_init_security(struct inode *inode,
if (!xattr_data)
return -ENOMEM;
 
-   xattr_data->type = EVM_XATTR_HMAC;
+   xattr_data->data.type = EVM_XATTR_HMAC;
rc = evm_init_hmac(inode, lsm_xattr, xattr_data->digest);
if (rc < 0)
goto out;
diff --git a/security/integrity/ima/ima_appraise.c 
b/security/integrity/ima/ima_appraise.c
index 2f6536ab69e8..18bbe753421a 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -168,7 +168,8 @@ enum hash_algo ima_get_hash_algo(struct evm_ima_xattr_data 
*xattr_value,
return sig->hash_algo;
break;
case IMA_XATTR_DIGEST_NG:
-   ret = xattr_value->digest[0];
+   /* first byte contains algorithm id */
+   ret = xattr_value->data[0];
if (ret < HASH_ALGO__LAST)
return ret;
break;
@@ -176,7 +177,7 @@ enum hash_algo ima_get_hash_algo(struct evm_ima_xattr_data 
*xattr_value,
/* this is for backward compatibility */
if (xattr_len == 21) {
unsigned int zero = 0;
-   if (!memcmp(&xattr_value->digest[16], &zero, 4))
+   if (!memcmp(&xattr_value->data[16], &zero, 4))
return HASH_ALGO_MD5;
else
return HASH_ALGO_SHA1;
@@ -275,7 +276,7 @@ int ima_appraise_measurement(enum ima_hooks func,
/* xattr length may be longer. md5 hash in previous
   version occupied 20 bytes in xattr, instead of 16
 */
-   rc = memcmp(&xattr_value->digest[hash_start],
+   rc = memcmp(&xattr_value->data[hash_start],
iint->ima_hash->digest,
iint->ima_hash->length);
else
diff --git a/security/integrity/integrity.h b/security/integrity/integrity.h
index 7de59f44cba3..88a29f72a74f 100644
--- a/security/integrity/integrity.h
+++ b/security/integrity/integrity.h
@@ -79,6 +79,12 @@ enum evm_ima_xattr_type {
 
 struct evm_ima_xattr_data {
u8 type;
+   u8 data[];
+} __packed;
+
+/* Only used in the EVM HMAC code. */
+struct evm_xattr {
+   struct evm_ima_xattr_data da

[PATCH v11 06/13] ima: Use designated initializers for struct ima_event_data

2019-06-10 Thread Thiago Jung Bauermann
Designated initializers allow specifying only the members of the struct
that need initialization. Non-mentioned members are initialized to zero.

This makes the code a bit clearer (particularly in ima_add_boot_aggregate)
and also allows adding a new member to the struct without having to update
all struct initializations.

Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
---
 security/integrity/ima/ima_api.c  | 13 +
 security/integrity/ima/ima_init.c |  4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 78eb11c7ac07..c0cf4bcfc82f 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -139,8 +139,10 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
 {
struct ima_template_entry *entry;
struct inode *inode = file_inode(file);
-   struct ima_event_data event_data = {iint, file, filename, NULL, 0,
-   cause};
+   struct ima_event_data event_data = { .iint = iint,
+.file = file,
+.filename = filename,
+.violation = cause };
int violation = 1;
int result;
 
@@ -294,8 +296,11 @@ void ima_store_measurement(struct integrity_iint_cache 
*iint,
int result = -ENOMEM;
struct inode *inode = file_inode(file);
struct ima_template_entry *entry;
-   struct ima_event_data event_data = {iint, file, filename, xattr_value,
-   xattr_len, NULL};
+   struct ima_event_data event_data = { .iint = iint,
+.file = file,
+.filename = filename,
+.xattr_value = xattr_value,
+.xattr_len = xattr_len };
int violation = 0;
 
if (iint->measured_pcrs & (0x1 << pcr))
diff --git a/security/integrity/ima/ima_init.c 
b/security/integrity/ima/ima_init.c
index 993d0f1915ff..368ef658a1cd 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -49,8 +49,8 @@ static int __init ima_add_boot_aggregate(void)
const char *audit_cause = "ENOMEM";
struct ima_template_entry *entry;
struct integrity_iint_cache tmp_iint, *iint = &tmp_iint;
-   struct ima_event_data event_data = {iint, NULL, boot_aggregate_name,
-   NULL, 0, NULL};
+   struct ima_event_data event_data = { .iint = iint,
+.filename = boot_aggregate_name };
int result = -ENOMEM;
int violation = 0;
struct {



[PATCH v11 05/13] integrity: Select CONFIG_KEYS instead of depending on it

2019-06-10 Thread Thiago Jung Bauermann
This avoids a dependency cycle in soon-to-be-introduced
CONFIG_IMA_APPRAISE_MODSIG: it will select CONFIG_MODULE_SIG_FORMAT
which in turn selects CONFIG_KEYS. Kconfig then complains that
CONFIG_INTEGRITY_SIGNATURE depends on CONFIG_KEYS.

Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Mimi Zohar 
---
 security/integrity/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/integrity/Kconfig b/security/integrity/Kconfig
index 3ba1168b1756..93d73902c571 100644
--- a/security/integrity/Kconfig
+++ b/security/integrity/Kconfig
@@ -17,8 +17,8 @@ if INTEGRITY
 
 config INTEGRITY_SIGNATURE
bool "Digital signature verification using multiple keyrings"
-   depends on KEYS
default n
+   select KEYS
select SIGNATURE
help
  This option enables digital signature verification support


[PATCH v11 07/13] ima: Add modsig appraise_type option for module-style appended signatures

2019-06-10 Thread Thiago Jung Bauermann
Introduce the modsig keyword to the IMA policy syntax to specify that
a given hook should expect the file to have the IMA signature appended
to it. Here is how it can be used in a rule:

appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig

With this rule, IMA will accept either a signature stored in the extended
attribute or an appended signature.

For now, the rule above will behave exactly the same as if
appraise_type=imasig was specified. The actual modsig implementation
will be introduced separately.

Suggested-by: Mimi Zohar 
Signed-off-by: Thiago Jung Bauermann 
---
 Documentation/ABI/testing/ima_policy |  6 +-
 security/integrity/ima/Kconfig   | 10 +
 security/integrity/ima/Makefile  |  1 +
 security/integrity/ima/ima.h |  9 
 security/integrity/ima/ima_modsig.c  | 31 
 security/integrity/ima/ima_policy.c  | 12 +--
 security/integrity/integrity.h   |  1 +
 7 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index b383c1763610..e622cdafe0af 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -36,7 +36,7 @@ Description:
euid:= decimal value
fowner:= decimal value
lsm:are LSM specific
-   option: appraise_type:= [imasig]
+   option: appraise_type:= [imasig] [imasig|modsig]
template:= name of a defined IMA template type
(eg, ima-ng). Only valid when action is "measure".
pcr:= decimal value
@@ -104,3 +104,7 @@ Description:
 
measure func=KEXEC_KERNEL_CHECK pcr=4
measure func=KEXEC_INITRAMFS_CHECK pcr=5
+
+   Example of appraise rule allowing modsig appended signatures:
+
+   appraise func=KEXEC_KERNEL_CHECK 
appraise_type=imasig|modsig
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index a18f8c6d13b5..bba19f9ea184 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -231,6 +231,16 @@ config IMA_APPRAISE_BOOTPARAM
  This option enables the different "ima_appraise=" modes
  (eg. fix, log) from the boot command line.
 
+config IMA_APPRAISE_MODSIG
+   bool "Support module-style signatures for appraisal"
+   depends on IMA_APPRAISE
+   default n
+   help
+  Adds support for signatures appended to files. The format of the
+  appended signature is the same used for signed kernel modules.
+  The modsig keyword can be used in the IMA policy to allow a hook
+  to accept such signatures.
+
 config IMA_TRUSTED_KEYRING
bool "Require all keys on the .ima keyring be signed (deprecated)"
depends on IMA_APPRAISE && SYSTEM_TRUSTED_KEYRING
diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
index d921dc4f9eb0..31d57cdf2421 100644
--- a/security/integrity/ima/Makefile
+++ b/security/integrity/ima/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_IMA) += ima.o
 ima-y := ima_fs.o ima_queue.o ima_init.o ima_main.o ima_crypto.o ima_api.o \
 ima_policy.o ima_template.o ima_template_lib.o
 ima-$(CONFIG_IMA_APPRAISE) += ima_appraise.o
+ima-$(CONFIG_IMA_APPRAISE_MODSIG) += ima_modsig.o
 ima-$(CONFIG_HAVE_IMA_KEXEC) += ima_kexec.o
 obj-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 18b48a6d0b80..9e2580164e97 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -298,6 +298,15 @@ static inline int ima_read_xattr(struct dentry *dentry,
 
 #endif /* CONFIG_IMA_APPRAISE */
 
+#ifdef CONFIG_IMA_APPRAISE_MODSIG
+bool ima_hook_supports_modsig(enum ima_hooks func);
+#else
+static inline bool ima_hook_supports_modsig(enum ima_hooks func)
+{
+   return false;
+}
+#endif /* CONFIG_IMA_APPRAISE_MODSIG */
+
 /* LSM based policy rules require audit */
 #ifdef CONFIG_IMA_LSM_RULES
 
diff --git a/security/integrity/ima/ima_modsig.c 
b/security/integrity/ima/ima_modsig.c
new file mode 100644
index ..87503bfe8c8b
--- /dev/null
+++ b/security/integrity/ima/ima_modsig.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * IMA support for appraising module-style appended signatures.
+ *
+ * Copyright (C) 2019  IBM Corporation
+ *
+ * Author:
+ * Thiago Jung Bauermann 
+ */
+
+#include "ima.h"
+
+/**
+ * ima_hook_supports_modsig - can the policy allow modsig for this hook?
+ *
+ * modsig is only supported by hooks using ima_post_read_file(), because only
+ * they preload the contents of the file in a buffer. FILE_CHECK does that in
+ * some cases, but not when reached from vfs_open(). POLICY_CHECK can support
+ * it, but it's not useful in practice because it's a text file so deny.
+ */
+bool ima_hook_supports_mod

[PATCH v11 08/13] ima: Factor xattr_verify() out of ima_appraise_measurement()

2019-06-10 Thread Thiago Jung Bauermann
Verify xattr signature in a separate function so that the logic in
ima_appraise_measurement() remains clear when it gains the ability to also
verify an appended module signature.

The code in the switch statement is unchanged except for having to
dereference the status and cause variables (since they're now pointers),
and fixing the style of a block comment to appease checkpatch.

Suggested-by: Mimi Zohar 
Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
---
 security/integrity/ima/ima_appraise.c | 141 +++---
 1 file changed, 81 insertions(+), 60 deletions(-)

diff --git a/security/integrity/ima/ima_appraise.c 
b/security/integrity/ima/ima_appraise.c
index 18bbe753421a..5d4772f39757 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -202,6 +202,83 @@ int ima_read_xattr(struct dentry *dentry,
return ret;
 }
 
+/*
+ * xattr_verify - verify xattr digest or signature
+ *
+ * Verify whether the hash or signature matches the file contents.
+ *
+ * Return 0 on success, error code otherwise.
+ */
+static int xattr_verify(enum ima_hooks func, struct integrity_iint_cache *iint,
+   struct evm_ima_xattr_data *xattr_value, int xattr_len,
+   enum integrity_status *status, const char **cause)
+{
+   int rc = -EINVAL, hash_start = 0;
+
+   switch (xattr_value->type) {
+   case IMA_XATTR_DIGEST_NG:
+   /* first byte contains algorithm id */
+   hash_start = 1;
+   /* fall through */
+   case IMA_XATTR_DIGEST:
+   if (iint->flags & IMA_DIGSIG_REQUIRED) {
+   *cause = "IMA-signature-required";
+   *status = INTEGRITY_FAIL;
+   break;
+   }
+   clear_bit(IMA_DIGSIG, &iint->atomic_flags);
+   if (xattr_len - sizeof(xattr_value->type) - hash_start >=
+   iint->ima_hash->length)
+   /*
+* xattr length may be longer. md5 hash in previous
+* version occupied 20 bytes in xattr, instead of 16
+*/
+   rc = memcmp(&xattr_value->data[hash_start],
+   iint->ima_hash->digest,
+   iint->ima_hash->length);
+   else
+   rc = -EINVAL;
+   if (rc) {
+   *cause = "invalid-hash";
+   *status = INTEGRITY_FAIL;
+   break;
+   }
+   *status = INTEGRITY_PASS;
+   break;
+   case EVM_IMA_XATTR_DIGSIG:
+   set_bit(IMA_DIGSIG, &iint->atomic_flags);
+   rc = integrity_digsig_verify(INTEGRITY_KEYRING_IMA,
+(const char *)xattr_value,
+xattr_len,
+iint->ima_hash->digest,
+iint->ima_hash->length);
+   if (rc == -EOPNOTSUPP) {
+   *status = INTEGRITY_UNKNOWN;
+   break;
+   }
+   if (IS_ENABLED(CONFIG_INTEGRITY_PLATFORM_KEYRING) && rc &&
+   func == KEXEC_KERNEL_CHECK)
+   rc = integrity_digsig_verify(INTEGRITY_KEYRING_PLATFORM,
+(const char *)xattr_value,
+xattr_len,
+iint->ima_hash->digest,
+iint->ima_hash->length);
+   if (rc) {
+   *cause = "invalid-signature";
+   *status = INTEGRITY_FAIL;
+   } else {
+   *status = INTEGRITY_PASS;
+   }
+   break;
+   default:
+   *status = INTEGRITY_UNKNOWN;
+   *cause = "unknown-ima-data";
+   break;
+   }
+
+   return rc;
+}
+
 /*
  * ima_appraise_measurement - appraise file measurement
  *
@@ -221,7 +298,7 @@ int ima_appraise_measurement(enum ima_hooks func,
struct dentry *dentry = file_dentry(file);
struct inode *inode = d_backing_inode(dentry);
enum integrity_status status = INTEGRITY_UNKNOWN;
-   int rc = xattr_len, hash_start = 0;
+   int rc = xattr_len;
 
if (!(inode->i_opflags & IOP_XATTR))
return INTEGRITY_UNKNOWN;
@@ -259,65 +336,9 @@ int ima_appraise_measurement(enum ima_hooks func,
WARN_ONCE(true, "Unexpected integrity status %d\n", status);
}
 
-   switch (xattr_value->type) {
-   case IMA_XATTR_DIGEST_NG:
-   /* first byte contains algorithm id */
-   hash_start = 1;
-   /* fall through */
-

[PATCH v11 09/13] ima: Implement support for module-style appended signatures

2019-06-10 Thread Thiago Jung Bauermann
Implement the appraise_type=imasig|modsig option, allowing IMA to read and
verify modsig signatures.

In case a file has both an xattr signature and an appended modsig, IMA will
only use the appended signature if the key used by the xattr signature
isn't present in the IMA or platform keyring.

Because modsig verification needs to convert from an integrity keyring id
to the keyring itself, add an integrity_keyring_from_id() function in
digsig.c so that integrity_modsig_verify() can use it.

Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Mimi Zohar 
---
 security/integrity/digsig.c   | 43 
 security/integrity/ima/Kconfig|  3 ++
 security/integrity/ima/ima.h  | 22 -
 security/integrity/ima/ima_appraise.c | 51 +--
 security/integrity/ima/ima_main.c | 11 -
 security/integrity/ima/ima_modsig.c   | 71 +++
 security/integrity/ima/ima_policy.c   | 12 ++---
 security/integrity/integrity.h| 19 +++
 8 files changed, 209 insertions(+), 23 deletions(-)

diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c
index e19c2eb72c51..3399a7e32830 100644
--- a/security/integrity/digsig.c
+++ b/security/integrity/digsig.c
@@ -43,11 +43,10 @@ static const char * const 
keyring_name[INTEGRITY_KEYRING_MAX] = {
 #define restrict_link_to_ima restrict_link_by_builtin_trusted
 #endif
 
-int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
-   const char *digest, int digestlen)
+static struct key *integrity_keyring_from_id(const unsigned int id)
 {
-   if (id >= INTEGRITY_KEYRING_MAX || siglen < 2)
-   return -EINVAL;
+   if (id >= INTEGRITY_KEYRING_MAX)
+   return ERR_PTR(-EINVAL);
 
if (!keyring[id]) {
keyring[id] =
@@ -56,23 +55,49 @@ int integrity_digsig_verify(const unsigned int id, const 
char *sig, int siglen,
int err = PTR_ERR(keyring[id]);
pr_err("no %s keyring: %d\n", keyring_name[id], err);
keyring[id] = NULL;
-   return err;
+   return ERR_PTR(err);
}
}
 
+   return keyring[id];
+}
+
+int integrity_digsig_verify(const unsigned int id, const char *sig, int siglen,
+   const char *digest, int digestlen)
+{
+   struct key *keyring;
+
+   if (siglen < 2)
+   return -EINVAL;
+
+   keyring = integrity_keyring_from_id(id);
+   if (IS_ERR(keyring))
+   return PTR_ERR(keyring);
+
switch (sig[1]) {
case 1:
/* v1 API expect signature without xattr type */
-   return digsig_verify(keyring[id], sig + 1, siglen - 1,
-digest, digestlen);
+   return digsig_verify(keyring, sig + 1, siglen - 1, digest,
+digestlen);
case 2:
-   return asymmetric_verify(keyring[id], sig, siglen,
-digest, digestlen);
+   return asymmetric_verify(keyring, sig, siglen, digest,
+digestlen);
}
 
return -EOPNOTSUPP;
 }
 
+int integrity_modsig_verify(const unsigned int id, const struct modsig *modsig)
+{
+   struct key *keyring;
+
+   keyring = integrity_keyring_from_id(id);
+   if (IS_ERR(keyring))
+   return PTR_ERR(keyring);
+
+   return ima_modsig_verify(keyring, modsig);
+}
+
 static int __integrity_init_keyring(const unsigned int id, key_perm_t perm,
struct key_restriction *restriction)
 {
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index bba19f9ea184..0fb542455698 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -234,6 +234,9 @@ config IMA_APPRAISE_BOOTPARAM
 config IMA_APPRAISE_MODSIG
bool "Support module-style signatures for appraisal"
depends on IMA_APPRAISE
+   depends on INTEGRITY_ASYMMETRIC_KEYS
+   select PKCS7_MESSAGE_PARSER
+   select MODULE_SIG_FORMAT
default n
help
   Adds support for signatures appended to files. The format of the
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 9e2580164e97..ebbfae10f174 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -192,6 +192,10 @@ enum ima_hooks {
__ima_hooks(__ima_hook_enumify)
 };
 
+extern const char *const func_tokens[];
+
+struct modsig;
+
 /* LIM API function definitions */
 int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
   int mask, enum ima_hooks func, int *pcr,
@@ -245,7 +249,7 @@ int ima_appraise_measurement(enum ima_hooks func,
 struct integrity_iint_cache *iint,
 st

[PATCH v11 10/13] ima: Collect modsig

2019-06-10 Thread Thiago Jung Bauermann
Obtain the modsig and calculate its corresponding hash in
ima_collect_measurement().

Signed-off-by: Thiago Jung Bauermann 
---
 security/integrity/ima/ima.h  |  8 -
 security/integrity/ima/ima_api.c  |  5 ++-
 security/integrity/ima/ima_appraise.c |  2 +-
 security/integrity/ima/ima_main.c |  2 +-
 security/integrity/ima/ima_modsig.c   | 50 ++-
 5 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index ebbfae10f174..0acc8e56ec73 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -203,7 +203,7 @@ int ima_get_action(struct inode *inode, const struct cred 
*cred, u32 secid,
 int ima_must_measure(struct inode *inode, int mask, enum ima_hooks func);
 int ima_collect_measurement(struct integrity_iint_cache *iint,
struct file *file, void *buf, loff_t size,
-   enum hash_algo algo);
+   enum hash_algo algo, struct modsig *modsig);
 void ima_store_measurement(struct integrity_iint_cache *iint, struct file 
*file,
   const unsigned char *filename,
   struct evm_ima_xattr_data *xattr_value,
@@ -307,6 +307,7 @@ static inline int ima_read_xattr(struct dentry *dentry,
 bool ima_hook_supports_modsig(enum ima_hooks func);
 int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
struct modsig **modsig);
+void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size);
 void ima_free_modsig(struct modsig *modsig);
 #else
 static inline bool ima_hook_supports_modsig(enum ima_hooks func)
@@ -320,6 +321,11 @@ static inline int ima_read_modsig(enum ima_hooks func, 
const void *buf,
return -EOPNOTSUPP;
 }
 
+static inline void ima_collect_modsig(struct modsig *modsig, const void *buf,
+ loff_t size)
+{
+}
+
 static inline void ima_free_modsig(struct modsig *modsig)
 {
 }
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c0cf4bcfc82f..c351b8c37278 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -208,7 +208,7 @@ int ima_get_action(struct inode *inode, const struct cred 
*cred, u32 secid,
  */
 int ima_collect_measurement(struct integrity_iint_cache *iint,
struct file *file, void *buf, loff_t size,
-   enum hash_algo algo)
+   enum hash_algo algo, struct modsig *modsig)
 {
const char *audit_cause = "failed";
struct inode *inode = file_inode(file);
@@ -255,6 +255,9 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
memcpy(iint->ima_hash, &hash, length);
iint->version = i_version;
 
+   if (modsig)
+   ima_collect_modsig(modsig, buf, size);
+
/* Possibly temporary failure due to type of read (eg. O_DIRECT) */
if (!result)
iint->flags |= IMA_COLLECTED;
diff --git a/security/integrity/ima/ima_appraise.c 
b/security/integrity/ima/ima_appraise.c
index 70252ac3321d..aa14e3fe25d5 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -438,7 +438,7 @@ void ima_update_xattr(struct integrity_iint_cache *iint, 
struct file *file)
!(iint->flags & IMA_HASH))
return;
 
-   rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo);
+   rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo, NULL);
if (rc < 0)
return;
 
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 8ddf9faa8d02..2c9d3cf85726 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -314,7 +314,7 @@ static int process_measurement(struct file *file, const 
struct cred *cred,
 
hash_algo = ima_get_hash_algo(xattr_value, xattr_len);
 
-   rc = ima_collect_measurement(iint, file, buf, size, hash_algo);
+   rc = ima_collect_measurement(iint, file, buf, size, hash_algo, modsig);
if (rc != 0 && rc != -EBADF && rc != -EINVAL)
goto out_locked;
 
diff --git a/security/integrity/ima/ima_modsig.c 
b/security/integrity/ima/ima_modsig.c
index f41ebe370fa0..d438b87dba89 100644
--- a/security/integrity/ima/ima_modsig.c
+++ b/security/integrity/ima/ima_modsig.c
@@ -17,6 +17,19 @@
 
 struct modsig {
struct pkcs7_message *pkcs7_msg;
+
+   enum hash_algo hash_algo;
+
+   /* This digest will go in the 'd-modsig' field of the IMA template. */
+   const u8 *digest;
+   u32 digest_size;
+
+   /*
+* This is what will go to the measurement list if the template requires
+* storing the signature.
+*/
+   int raw_pkcs7_len;
+   u8 raw_pkcs7[];
 };
 
 /**
@@ -71,7 +84,8 @@ int ima_read_modsig(enum ima_hooks func, const v

[PATCH v11 12/13] ima: Store the measurement again when appraising a modsig

2019-06-10 Thread Thiago Jung Bauermann
If the IMA template contains the "modsig" or "d-modsig" field, then the
modsig should be added to the measurement list when the file is appraised.

And that is what normally happens, but if a measurement rule caused a file
containing a modsig to be measured before a different rule causes it to be
appraised, the resulting measurement entry will not contain the modsig
because it is only fetched during appraisal. When the appraisal rule
triggers, it won't store a new measurement containing the modsig because
the file was already measured.

We need to detect that situation and store an additional measurement with
the modsig. This is done by adding an IMA_MEASURE action flag if we read a
modsig and the IMA template contains a modsig field.

Suggested-by: Mimi Zohar 
Signed-off-by: Thiago Jung Bauermann 
---
 security/integrity/ima/ima.h  |  1 +
 security/integrity/ima/ima_api.c  | 19 +++
 security/integrity/ima/ima_main.c | 15 ---
 security/integrity/ima/ima_template.c | 19 +++
 4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index a2b2c13ceda8..44f5f60424c2 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -149,6 +149,7 @@ void ima_putc(struct seq_file *m, void *data, int datalen);
 void ima_print_digest(struct seq_file *m, u8 *digest, u32 size);
 struct ima_template_desc *ima_template_desc_current(void);
 struct ima_template_desc *lookup_template_desc(const char *name);
+bool ima_template_has_modsig(const struct ima_template_desc *ima_template);
 int ima_restore_measurement_entry(struct ima_template_entry *entry);
 int ima_restore_measurement_list(loff_t bufsize, void *buf);
 int ima_measurements_show(struct seq_file *m, void *v);
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 32297d1e6164..bb887ed3d8a7 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -222,6 +222,14 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
char digest[IMA_MAX_DIGEST_SIZE];
} hash;
 
+   /*
+* Always collect the modsig, because IMA might have already collected
+* the file digest without collecting the modsig in a previous
+* measurement rule.
+*/
+   if (modsig)
+   ima_collect_modsig(modsig, buf, size);
+
if (iint->flags & IMA_COLLECTED)
goto out;
 
@@ -255,9 +263,6 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
memcpy(iint->ima_hash, &hash, length);
iint->version = i_version;
 
-   if (modsig)
-   ima_collect_modsig(modsig, buf, size);
-
/* Possibly temporary failure due to type of read (eg. O_DIRECT) */
if (!result)
iint->flags |= IMA_COLLECTED;
@@ -307,7 +312,13 @@ void ima_store_measurement(struct integrity_iint_cache 
*iint,
 .modsig = modsig };
int violation = 0;
 
-   if (iint->measured_pcrs & (0x1 << pcr))
+   /*
+* We still need to store the measurement in the case of MODSIG because
+* we only have its contents to put in the list at the time of
+* appraisal, but a file measurement from earlier might already exist in
+* the measurement list.
+*/
+   if (iint->measured_pcrs & (0x1 << pcr) && !modsig)
return;
 
result = ima_alloc_init_template(&event_data, &entry, template_desc);
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 85afb31fafe0..e0ca39f81a59 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -307,9 +307,18 @@ static int process_measurement(struct file *file, const 
struct cred *cred,
/* read 'security.ima' */
xattr_len = ima_read_xattr(file_dentry(file), &xattr_value);
 
-   /* Read the appended modsig if allowed by the policy. */
-   if (iint->flags & IMA_MODSIG_ALLOWED)
-   ima_read_modsig(func, buf, size, &modsig);
+   /*
+* Read the appended modsig if allowed by the policy, and allow
+* an additional measurement list entry, if needed, based on the
+* template format and whether the file was already measured.
+*/
+   if (iint->flags & IMA_MODSIG_ALLOWED) {
+   rc = ima_read_modsig(func, buf, size, &modsig);
+
+   if (!rc && ima_template_has_modsig(template_desc) &&
+   iint->flags & IMA_MEASURED)
+   action |= IMA_MEASURE;
+   }
}
 
hash_algo = ima_get_hash_algo(xattr_value, xattr_len);
diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index e25bef419

[PATCH v11 11/13] ima: Define ima-modsig template

2019-06-10 Thread Thiago Jung Bauermann
Define new "d-modsig" template field which holds the digest that is
expected to match the one contained in the modsig, and also new "modsig"
template field which holds the appended file signature.

Add a new "ima-modsig" defined template descriptor with the new fields as
well as the ones from the "ima-sig" descriptor.

Change ima_store_measurement() to accept a struct modsig * argument so that
it can be passed along to the templates via struct ima_event_data.

Suggested-by: Mimi Zohar 
Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Mimi Zohar 
---
 Documentation/security/IMA-templates.rst  |  7 ++-
 security/integrity/ima/ima.h  | 20 +++-
 security/integrity/ima/ima_api.c  |  5 +-
 security/integrity/ima/ima_main.c |  2 +-
 security/integrity/ima/ima_modsig.c   | 19 +++
 security/integrity/ima/ima_policy.c   | 41 
 security/integrity/ima/ima_template.c |  7 ++-
 security/integrity/ima/ima_template_lib.c | 60 ++-
 security/integrity/ima/ima_template_lib.h |  4 ++
 9 files changed, 157 insertions(+), 8 deletions(-)

diff --git a/Documentation/security/IMA-templates.rst 
b/Documentation/security/IMA-templates.rst
index 2cd0e273cc9a..8da20b444be0 100644
--- a/Documentation/security/IMA-templates.rst
+++ b/Documentation/security/IMA-templates.rst
@@ -68,15 +68,18 @@ descriptors by adding their identifier to the format string
  - 'd-ng': the digest of the event, calculated with an arbitrary hash
algorithm (field format: [:]digest, where the digest
prefix is shown only if the hash algorithm is not SHA1 or MD5);
+ - 'd-modsig': the digest of the event without the appended modsig;
  - 'n-ng': the name of the event, without size limitations;
- - 'sig': the file signature.
+ - 'sig': the file signature;
+ - 'modsig' the appended file signature.
 
 
 Below, there is the list of defined template descriptors:
 
  - "ima": its format is ``d|n``;
  - "ima-ng" (default): its format is ``d-ng|n-ng``;
- - "ima-sig": its format is ``d-ng|n-ng|sig``.
+ - "ima-sig": its format is ``d-ng|n-ng|sig``;
+ - "ima-modsig": its format is ``d-ng|n-ng|sig|d-modsig|modsig``.
 
 
 
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 0acc8e56ec73..a2b2c13ceda8 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -64,6 +64,7 @@ struct ima_event_data {
const unsigned char *filename;
struct evm_ima_xattr_data *xattr_value;
int xattr_len;
+   const struct modsig *modsig;
const char *violation;
 };
 
@@ -207,7 +208,7 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
 void ima_store_measurement(struct integrity_iint_cache *iint, struct file 
*file,
   const unsigned char *filename,
   struct evm_ima_xattr_data *xattr_value,
-  int xattr_len, int pcr,
+  int xattr_len, const struct modsig *modsig, int pcr,
   struct ima_template_desc *template_desc);
 void ima_audit_measurement(struct integrity_iint_cache *iint,
   const unsigned char *filename);
@@ -308,6 +309,10 @@ bool ima_hook_supports_modsig(enum ima_hooks func);
 int ima_read_modsig(enum ima_hooks func, const void *buf, loff_t buf_len,
struct modsig **modsig);
 void ima_collect_modsig(struct modsig *modsig, const void *buf, loff_t size);
+int ima_get_modsig_digest(const struct modsig *modsig, enum hash_algo *algo,
+ const u8 **digest, u32 *digest_size);
+int ima_get_raw_modsig(const struct modsig *modsig, const void **data,
+  u32 *data_len);
 void ima_free_modsig(struct modsig *modsig);
 #else
 static inline bool ima_hook_supports_modsig(enum ima_hooks func)
@@ -326,6 +331,19 @@ static inline void ima_collect_modsig(struct modsig 
*modsig, const void *buf,
 {
 }
 
+static inline int ima_get_modsig_digest(const struct modsig *modsig,
+   enum hash_algo *algo, const u8 **digest,
+   u32 *digest_size)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int ima_get_raw_modsig(const struct modsig *modsig,
+const void **data, u32 *data_len)
+{
+   return -EOPNOTSUPP;
+}
+
 static inline void ima_free_modsig(struct modsig *modsig)
 {
 }
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c351b8c37278..32297d1e6164 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -291,7 +291,7 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
 void ima_store_measurement(struct integrity_iint_cache *iint,
   struct file *file, const unsigned char *filename,
   struct evm_ima_xattr_data *xattr_value,
-  int xattr_len, int pcr,
+ 

[PATCH v11 13/13] ima: Allow template= option for appraise rules as well

2019-06-10 Thread Thiago Jung Bauermann
It's useful being able to specify a different IMA template on appraise
policy rules, so allow it.

Signed-off-by: Thiago Jung Bauermann 
Suggested-by: Mimi Zohar 
---
 security/integrity/ima/ima_policy.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 6463ab8921ea..1ac1ef458f2e 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -1110,7 +1110,8 @@ static int ima_parse_rule(char *rule, struct 
ima_rule_entry *entry)
break;
case Opt_template:
ima_log_string(ab, "template", args[0].from);
-   if (entry->action != MEASURE) {
+   if (entry->action != MEASURE &&
+   entry->action != APPRAISE) {
result = -EINVAL;
break;
}



Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-10 Thread Michael Neuling


> >  2:
> > -BEGIN_FTR_SECTION
> > -   /* POWER9 with disabled DAWR */
> > +   LOAD_REG_ADDR(r11, dawr_force_enable)
> > +   lbz r11, 0(r11)
> > +   cmpdi   r11, 0
> > li  r3, H_HARDWARE
> > -   blr
> > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > +   beqlr
> 
> Why is this a 'beqlr' ? Shouldn't it be a blr ? 

I believe it's right and should be a beqlr.  It's to replace the FTR section to
make it dynamic based on the dawr_force_enable bit.

Mikey