Re: [PATCH v7 00/12] implement KASLR for powerpc/fsl_booke/32

2019-10-08 Thread Jason Yan

Hi Scott,

Would you please take sometime to test this?

Thank you so much.

On 2019/9/24 13:52, Jason Yan wrote:

Hi Scott,

Can you test v7 to see if it works to load a kernel at a non-zero address?

Thanks,

On 2019/9/20 17:45, Jason Yan wrote:

This series implements KASLR for powerpc/fsl_booke/32, as a security
feature that deters exploit attempts relying on knowledge of the location
of kernel internals.

Since CONFIG_RELOCATABLE has already supported, what we need to do is
map or copy kernel to a proper place and relocate. Freescale Book-E
parts expect lowmem to be mapped by fixed TLB entries(TLB1). The TLB1
entries are not suitable to map the kernel directly in a randomized
region, so we chose to copy the kernel to a proper place and restart to
relocate.

Entropy is derived from the banner and timer base, which will change 
every

build and boot. This not so much safe so additionally the bootloader may
pass entropy via the /chosen/kaslr-seed node in device tree.

We will use the first 512M of the low memory to randomize the kernel
image. The memory will be split in 64M zones. We will use the lower 8
bit of the entropy to decide the index of the 64M zone. Then we chose a
16K aligned offset inside the 64M zone to put the kernel in.

 KERNELBASE

 |-->   64M   <--|
 |   |
 +---+    ++---+
 |   ||    |kernel|    |   |
 +---+    ++---+
 | |
 |->   offset    <-|

   kernstart_virt_addr

We also check if we will overlap with some areas like the dtb area, the
initrd area or the crashkernel area. If we cannot find a proper area,
kaslr will be disabled and boot from the original kernel.

Changes since v6:
  - Rename create_tlb_entry() to create_kaslr_tlb_entry()
  - Remove MAS2_VAL since there is no more users.
  - Move kaslr_booke.c to arch/powerpc/mm/nohash.
  - Call flush_icache_range() after copying the kernel.
  - Warning if no kaslr-seed provided by the bootloader
  - Use the right physical address when checking if the new position 
will overlap with other regions.
  - Do not clear bss for the second pass because some global variables 
will not be initialized again
  - Use tabs instead of spaces between the mnemonic and the 
arguments(in fsl_booke_entry_mapping.S).


Changes since v5:
  - Rename M_IF_NEEDED to MAS2_M_IF_NEEDED
  - Define some global variable as __ro_after_init
  - Replace kimage_vaddr with kernstart_virt_addr
  - Depend on RELOCATABLE, not select it
  - Modify the comment block below the SPDX tag
  - Remove some useless headers in kaslr_booke.c and move is_second_reloc
    declarationto mmu_decl.h
  - Remove DBG() and use pr_debug() and rewrite comment above 
get_boot_seed().

  - Add a patch to document the KASLR implementation.
  - Split a patch from patch #10 which exports kaslr offset in 
VMCOREINFO ELF notes.

  - Remove extra logic around finding nokaslr string in cmdline.
  - Make regions static global and __initdata

Changes since v4:
  - Add Reviewed-by tag from Christophe
  - Remove an unnecessary cast
  - Remove unnecessary parenthesis
  - Fix checkpatch warning

Changes since v3:
  - Add Reviewed-by and Tested-by tag from Diana
  - Change the comment in fsl_booke_entry_mapping.S to be consistent
    with the new code.

Changes since v2:
  - Remove unnecessary #ifdef
  - Use SZ_64M instead of0x400
  - Call early_init_dt_scan_chosen() to init boot_command_line
  - Rename kaslr_second_init() to kaslr_late_init()

Changes since v1:
  - Remove some useless 'extern' keyword.
  - Replace EXPORT_SYMBOL with EXPORT_SYMBOL_GPL
  - Improve some assembly code
  - Use memzero_explicit instead of memset
  - Use boot_command_line and remove early_command_line
  - Do not print kaslr offset if kaslr is disabled

Jason Yan (12):
   powerpc: unify definition of M_IF_NEEDED
   powerpc: move memstart_addr and kernstart_addr to init-common.c
   powerpc: introduce kernstart_virt_addr to store the kernel base
   powerpc/fsl_booke/32: introduce create_kaslr_tlb_entry() helper
   powerpc/fsl_booke/32: introduce reloc_kernel_entry() helper
   powerpc/fsl_booke/32: implement KASLR infrastructure
   powerpc/fsl_booke/32: randomize the kernel image offset
   powerpc/fsl_booke/kaslr: clear the original kernel if randomized
   powerpc/fsl_booke/kaslr: support nokaslr cmdline parameter
   powerpc/fsl_booke/kaslr: dump out kernel offset information on panic
   powerpc/fsl_booke/kaslr: export offset in VMCOREINFO ELF notes
   powerpc/fsl_booke/32: Document KASLR implementation

  Documentation/powerpc/kaslr-booke32.rst   |  42 ++
  arch/powerpc/Kconfig  |  11 +
  arch/powerpc/include/asm/nohash/mmu-book3e.h  |  11 +-
  arch/powerpc/include/asm/page.h   |   7 +
  arch/powerpc/kernel/early_32.c    |   5 +-
  arch/powerp

Re: [PATCH 1/3] powerpc/book3s64/hash/4k: 4k supports only 16TB linear mapping

2019-10-08 Thread Michael Ellerman
Samuel Holland  writes:
> Hello,
>
> On 9/17/19 9:57 AM, Aneesh Kumar K.V wrote:
>> With commit: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in 
>> the
>> same 0xc range"), we now split the 64TB address range into 4 contexts each of
>> 16TB. That implies we can do only 16TB linear mapping. Make sure we don't
>> add physical memory above 16TB if that is present in the system.
>> 
>> Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in 
>> thesame 0xc range")
>> Reported-by: Cameron Berkenpas 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  arch/powerpc/include/asm/book3s/64/mmu.h | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>> b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index bb3deb76c951..86cce8189240 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -35,12 +35,16 @@ extern struct mmu_psize_def 
>> mmu_psize_defs[MMU_PAGE_COUNT];
>>   * memory requirements with large number of sections.
>>   * 51 bits is the max physical real address on POWER9
>>   */
>> -#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) 
>> &&  \
>> -defined(CONFIG_PPC_64K_PAGES)
>> +
>> +#if defined(CONFIG_PPC_64K_PAGES)
>> +#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME)
>
> This prevents accessing physical memory over 16TB with 4k pages and radix MMU 
> as
> well. Was this intentional?

No, it was meant to be a temporary fix until the rest of Aneesh's series
fixed things up properly, but then there were problems with those
patches so he asked me to just pick up this one.

At the moment 4K hash won't boot at all if you have too much RAM on P9,
so this was meant to at least avoid that.

But breaking 4K radix is arguably worse, so I'll drop this for now.

cheers


[PATCH] spufs: fix a crash in spufs_create_root()

2019-10-08 Thread Emmanuel Nicolet
The spu_fs_context was not set in fc->fs_private, this caused a crash
when accessing ctx->mode in spufs_create_root().

Signed-off-by: Emmanuel Nicolet 
---
 arch/powerpc/platforms/cell/spufs/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
b/arch/powerpc/platforms/cell/spufs/inode.c
index 1d93e55a2de1..2dd452a047cd 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -761,6 +761,7 @@ static int spufs_init_fs_context(struct fs_context *fc)
ctx->gid = current_gid();
ctx->mode = 0755;
 
+   fc->fs_private = ctx;
fc->s_fs_info = sbi;
fc->ops = &spufs_context_ops;
return 0;
-- 
2.23.0



Re: [PATCH v5 02/11] powerpc/mm: Adds counting method to monitor lockless pgtable walks

2019-10-08 Thread Christopher Lameter



On Tue, 8 Oct 2019, Leonardo Bras wrote:

> So you say that the performance impact of using my approach is the same
> as using locks? (supposing that lock never waits)
>
> So, there are 'lockless pagetable walks' only for the sake of better
> performance?

I thought that was the major motivation here.

> I thought they existed to enable doing pagetable walks in states where
> locking was not safe.

That sounds profoundly dangerous. Usually locking makes things safe to
access. Accesses without locking require a lot of care.



Re: [PATCH v5 02/11] powerpc/mm: Adds counting method to monitor lockless pgtable walks

2019-10-08 Thread Leonardo Bras
On Tue, 2019-10-08 at 17:43 +, Christopher Lameter wrote:
> On Tue, 8 Oct 2019, Leonardo Bras wrote:
> 
> > > You are creating contention on a single exclusive cacheline. Doesnt this
> > > defeat the whole purpose of the lockless page table walk? Use mmap_sem or
> > > so should cause the same performance regression?
> > 
> > Sorry, I did not understand that question.
> > I mean, this is just a refcount and never causes a lock.
> 
> Locks also use atomic operations like a refcount increment. Both require
> the cacheline to be in exclusive state. So the impact is very similar.

Thanks for explaining. :)

So you say that the performance impact of using my approach is the same
as using locks? (supposing that lock never waits)

So, there are 'lockless pagetable walks' only for the sake of better
performance? 

I thought they existed to enable doing pagetable walks in states where
locking was not safe.

Is that right?

Thanks!
Leonardo Brás,



signature.asc
Description: This is a digitally signed message part


Re: [PATCH v5 02/11] powerpc/mm: Adds counting method to monitor lockless pgtable walks

2019-10-08 Thread Christopher Lameter
On Tue, 8 Oct 2019, Leonardo Bras wrote:

> > You are creating contention on a single exclusive cacheline. Doesnt this
> > defeat the whole purpose of the lockless page table walk? Use mmap_sem or
> > so should cause the same performance regression?
>
> Sorry, I did not understand that question.
> I mean, this is just a refcount and never causes a lock.

Locks also use atomic operations like a refcount increment. Both require
the cacheline to be in exclusive state. So the impact is very similar.


Re: [PATCH v5 02/11] powerpc/mm: Adds counting method to monitor lockless pgtable walks

2019-10-08 Thread Leonardo Bras
On Tue, 2019-10-08 at 15:11 +, Christopher Lameter wrote:
> 
> On Wed, 2 Oct 2019, Leonardo Bras wrote:
> 
> > +
> > +inline unsigned long __begin_lockless_pgtbl_walk(struct mm_struct *mm,
> > +bool disable_irq)
> > +{
> > +   unsigned long irq_mask = 0;
> > +
> > +   if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING))
> > +   atomic_inc(&mm->lockless_pgtbl_walkers);
> > +
> 
> You are creating contention on a single exclusive cacheline. Doesnt this
> defeat the whole purpose of the lockless page table walk? Use mmap_sem or
> so should cause the same performance regression?

Sorry, I did not understand that question.
I mean, this is just a refcount and never causes a lock.  


FYI: This function was updated as following, and will be in v6:

#ifdef CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING
atomic_inc(&mm->lockless_pgtbl_walkers);
#endif
smp_mb();

IS_ENABLED doesnt work fine if CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING
is not defined, causing an error: the mm member lockless_pgtbl_walkers
doesn't exist.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] spufs: fix a crash in spufs_create_root()

2019-10-08 Thread Arnd Bergmann
On Tue, Oct 8, 2019 at 4:13 PM Emmanuel Nicolet
 wrote:
>
> The spu_fs_context was not set in fc->fs_private, this caused a crash
> when accessing ctx->mode in spufs_create_root().
>
> Signed-off-by: Emmanuel Nicolet 

Fixes: d2e0981c3b9a ("vfs: Convert spufs to use the new mount API")
Acked-by: Arnd Bergmann 

>  arch/powerpc/platforms/cell/spufs/inode.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
> b/arch/powerpc/platforms/cell/spufs/inode.c
> index 1d93e55a2de1..2dd452a047cd 100644
> --- a/arch/powerpc/platforms/cell/spufs/inode.c
> +++ b/arch/powerpc/platforms/cell/spufs/inode.c
> @@ -761,6 +761,7 @@ static int spufs_init_fs_context(struct fs_context *fc)
> ctx->gid = current_gid();
> ctx->mode = 0755;
>
> +   fc->fs_private = ctx;
> fc->s_fs_info = sbi;
> fc->ops = &spufs_context_ops;
> return 0;
> --
> 2.23.0
>


Re: [PATCH v5 02/11] powerpc/mm: Adds counting method to monitor lockless pgtable walks

2019-10-08 Thread Christopher Lameter



On Wed, 2 Oct 2019, Leonardo Bras wrote:

> +
> +inline unsigned long __begin_lockless_pgtbl_walk(struct mm_struct *mm,
> +  bool disable_irq)
> +{
> + unsigned long irq_mask = 0;
> +
> + if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING))
> + atomic_inc(&mm->lockless_pgtbl_walkers);
> +

You are creating contention on a single exclusive cacheline. Doesnt this
defeat the whole purpose of the lockless page table walk? Use mmap_sem or
so should cause the same performance regression?


Re: [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks

2019-10-08 Thread Kirill A. Shutemov
On Sat, Oct 05, 2019 at 02:05:29PM +0530, Aneesh Kumar K.V wrote:
> MADV_DONTNEED has caused us multiple issues due to the fact that it can run
> in parallel to page fault. I am not sure whether we have a known/noticeable
> performance gain in allowing that with mmap_sem held in read mode.

Yes we do. MADV_DONTNEED used a lot by userspace memory allocators and it
will be very noticible performance regression if we would switch it to
down_write(mmap_sem).

-- 
 Kirill A. Shutemov


[PATCH] hvc: dcc: Add earlycon support

2019-10-08 Thread Michal Simek
Add DCC earlycon support for early printks. The patch is useful for SoC
bringup where HW serial console is broken.

Signed-off-by: Michal Simek 
---

I have this patch in Xilinx tree for quite a long time and it was develop
as preparation work for SoC bringup where jtag is functional and get
information from kernel what's going on.

There is one checkpatch warning
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+static void dcc_early_write(struct console *con, const char *s, unsigned n)
but
console write is defined like that.
include/linux/console.h  +145
void(*write)(struct console *, const char *, unsigned);
---
 drivers/tty/hvc/hvc_dcc.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/tty/hvc/hvc_dcc.c b/drivers/tty/hvc/hvc_dcc.c
index 02629a1f193d..8e0edb7d93fd 100644
--- a/drivers/tty/hvc/hvc_dcc.c
+++ b/drivers/tty/hvc/hvc_dcc.c
@@ -1,7 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2010, 2014 The Linux Foundation. All rights reserved.  */
 
+#include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -12,6 +15,31 @@
 #define DCC_STATUS_RX  (1 << 30)
 #define DCC_STATUS_TX  (1 << 29)
 
+static void dcc_uart_console_putchar(struct uart_port *port, int ch)
+{
+   while (__dcc_getstatus() & DCC_STATUS_TX)
+   cpu_relax();
+
+   __dcc_putchar(ch);
+}
+
+static void dcc_early_write(struct console *con, const char *s, unsigned n)
+{
+   struct earlycon_device *dev = con->data;
+
+   uart_console_write(&dev->port, s, n, dcc_uart_console_putchar);
+}
+
+static int __init dcc_early_console_setup(struct earlycon_device *device,
+ const char *opt)
+{
+   device->con->write = dcc_early_write;
+
+   return 0;
+}
+
+EARLYCON_DECLARE(dcc, dcc_early_console_setup);
+
 static int hvc_dcc_put_chars(uint32_t vt, const char *buf, int count)
 {
int i;
-- 
2.17.1



Re: [PATCH v6 3/9] powerpc: add support to initialize ima policy rules

2019-10-08 Thread Nayna




On 10/02/2019 05:49 PM, Mimi Zohar wrote:

On Tue, 2019-10-01 at 12:07 -0400, Nayna wrote:

On 09/30/2019 09:04 PM, Thiago Jung Bauermann wrote:

Hello,

Hi,


diff --git a/arch/powerpc/kernel/ima_arch.c b/arch/powerpc/kernel/ima_arch.c
new file mode 100644
index ..39401b67f19e
--- /dev/null
+++ b/arch/powerpc/kernel/ima_arch.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Nayna Jain
+ */
+
+#include 
+#include 
+
+bool arch_ima_get_secureboot(void)
+{
+   return is_powerpc_os_secureboot_enabled();
+}
+
+/* Defines IMA appraise rules for secureboot */
+static const char *const arch_rules[] = {
+   "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig|modsig",
+#if !IS_ENABLED(CONFIG_MODULE_SIG)
+   "appraise func=MODULE_CHECK appraise_type=imasig|modsig",
+#endif
+   NULL
+};
+
+/*
+ * Returns the relevant IMA arch policies based on the system secureboot state.
+ */
+const char *const *arch_get_ima_policy(void)
+{
+   if (is_powerpc_os_secureboot_enabled())
+   return arch_rules;
+
+   return NULL;
+}

If CONFIG_MODULE_SIG is enabled but module signatures aren't enforced,
then IMA won't enforce module signature either. x86's
arch_get_ima_policy() calls set_module_sig_enforced(). Doesn't the
powerpc version need to do that as well?

On the flip side, if module signatures are enforced by the module
subsystem then IMA will verify the signature a second time since there's
no sharing of signature verification results between the module
subsystem and IMA (this was observed by Mimi).

IMHO this is a minor issue, since module loading isn't a hot path and
the duplicate work shouldn't impact anything. But it could be avoided by
having a NULL entry in arch_rules, which arch_get_ima_policy() would
dynamically update with the "appraise func=MODULE_CHECK" rule if
is_module_sig_enforced() is true.

Thanks Thiago for reviewing.  I am wondering that this will give two
meanings for NULL. Can we do something like below, there are possibly
two options ?

1. Set IMA_APPRAISED in the iint->flags if is_module_sig_enforced().

OR

2. Let ima_get_action() check for is_module_sig_enforced() when policy
is appraise and func is MODULE_CHECK.

I'm a bit hesitant about mixing the module subsystem signature
verification method with the IMA measure "template=ima-modsig" rules.
  Does it actually work?

We can at least limit verifying the same appended signature twice to
when "module.sig_enforce" is specified on the boot command line, by
changing "!IS_ENABLED(CONFIG_MODULE_SIG)" to test
"CONFIG_MODULE_SIG_FORCE".


Yes this seems to be a better idea. I have implemented this in the v7 
version of the ima_arch version.


Thanks & Regards,
 - Nayna


Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware

2019-10-08 Thread Yunsheng Lin
On 2019/9/25 18:41, Peter Zijlstra wrote:
> On Wed, Sep 25, 2019 at 05:14:20PM +0800, Yunsheng Lin wrote:
>> From the discussion above, It seems making the node_to_cpumask_map()
>> NUMA_NO_NODE aware is the most feasible way to move forwad.
> 
> That's still wrong.

Hi, Peter

It seems this has trapped in the dead circle.

>From my understanding, NUMA_NO_NODE which means not node numa preference
is the state to describe the node of virtual device or the physical device
that has equal distance to all cpu.

We can be stricter if the device does have a nearer node, but we can not
deny that a device does not have a node numa preference or node affinity,
which also means the control or data buffer can be allocated at the node where
the process is running.

As you has proposed, making it -2 and have dev_to_node() warn if the device does
have a nearer node and not set by the fw is a way to be stricter.

But I think maybe being stricter is not really relevant to NUMA_NO_NODE, because
we does need a state to describe the device that have equal distance to all 
node,
even if it is not physically scalable.

Any better suggestion to move this forward?

> 
> .
> 



Re: [PATCH V4 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-10-08 Thread Anshuman Khandual



On 10/07/2019 07:30 PM, Kirill A. Shutemov wrote:
> On Mon, Oct 07, 2019 at 03:51:58PM +0200, Ingo Molnar wrote:
>>
>> * Kirill A. Shutemov  wrote:
>>
>>> On Mon, Oct 07, 2019 at 03:06:17PM +0200, Ingo Molnar wrote:

 * Anshuman Khandual  wrote:

> This adds a test module which will validate architecture page table 
> helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.
>
> Test page table and memory pages creating it's entries at various level 
> are
> all allocated from system memory with required alignments. If memory pages
> with required size and alignment could not be allocated, then all 
> depending
> individual tests are skipped.

> diff --git a/arch/x86/include/asm/pgtable_64_types.h 
> b/arch/x86/include/asm/pgtable_64_types.h
> index 52e5f5f2240d..b882792a3999 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -40,6 +40,8 @@ static inline bool pgtable_l5_enabled(void)
>  #define pgtable_l5_enabled() 0
>  #endif /* CONFIG_X86_5LEVEL */
>  
> +#define mm_p4d_folded(mm) (!pgtable_l5_enabled())
> +
>  extern unsigned int pgdir_shift;
>  extern unsigned int ptrs_per_p4d;

 Any deep reason this has to be a macro instead of proper C?
>>>
>>> It's a way to override the generic mm_p4d_folded(). It can be rewritten
>>> as inline function + define. Something like:
>>>
>>> #define mm_p4d_folded mm_p4d_folded
>>> static inline bool mm_p4d_folded(struct mm_struct *mm)
>>> {
>>> return !pgtable_l5_enabled();
>>> }
>>>
>>> But I don't see much reason to be more verbose here than needed.
>>
>> C type checking? Documentation? Yeah, I know it's just a one-liner, but 
>> the principle of the death by a thousand cuts applies here.
> 
> Okay, if you think it worth it. Anshuman, could you fix it up for the next
> submission?

Sure, will do.

> 
> 
>> BTW., any reason this must be in the low level pgtable_64_types.h type 
>> header, instead of one of the API level header files?
> 
> I defined it next pgtable_l5_enabled(). What is more appropriate place to
> you? pgtable_64.h? Yeah, it makes sense.


Needs to be moved to arch/x86/include/asm/pgtable_64.h as well ?


Re: [PATCH V4 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-10-08 Thread Anshuman Khandual


On 10/07/2019 06:36 PM, Ingo Molnar wrote:
> 
> * Anshuman Khandual  wrote:
> 
>> This adds a test module which will validate architecture page table helpers
>> and accessors regarding compliance with generic MM semantics expectations.
>> This will help various architectures in validating changes to the existing
>> page table helpers or addition of new ones.
>>
>> Test page table and memory pages creating it's entries at various level are
>> all allocated from system memory with required alignments. If memory pages
>> with required size and alignment could not be allocated, then all depending
>> individual tests are skipped.
> 
>> diff --git a/arch/x86/include/asm/pgtable_64_types.h 
>> b/arch/x86/include/asm/pgtable_64_types.h
>> index 52e5f5f2240d..b882792a3999 100644
>> --- a/arch/x86/include/asm/pgtable_64_types.h
>> +++ b/arch/x86/include/asm/pgtable_64_types.h
>> @@ -40,6 +40,8 @@ static inline bool pgtable_l5_enabled(void)
>>  #define pgtable_l5_enabled() 0
>>  #endif /* CONFIG_X86_5LEVEL */
>>  
>> +#define mm_p4d_folded(mm) (!pgtable_l5_enabled())
>> +
>>  extern unsigned int pgdir_shift;
>>  extern unsigned int ptrs_per_p4d;
> 
> Any deep reason this has to be a macro instead of proper C?
> 
>> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
>> index 327b3ebf23bf..683131b1ee7d 100644
>> --- a/mm/Kconfig.debug
>> +++ b/mm/Kconfig.debug
>> @@ -117,3 +117,18 @@ config DEBUG_RODATA_TEST
>>  depends on STRICT_KERNEL_RWX
>>  ---help---
>>This option enables a testcase for the setting rodata read-only.
>> +
>> +config DEBUG_ARCH_PGTABLE_TEST
>> +bool "Test arch page table helpers for semantics compliance"
>> +depends on MMU
>> +depends on DEBUG_KERNEL
>> +depends on !(ARM || IA64)
> 
> Please add a proper enabling switch for architectures to opt in.

Sure, will do.

> 
> Please also add it to Documentation/features/list-arch.sh so that it's 
> listed as a 'TODO' entry on architectures where the tests are not enabled 
> yet.

Will do.

> 
>> +help
>> +  This options provides a kernel module which can be used to test
>> +  architecture page table helper functions on various platform in
>> +  verifying if they comply with expected generic MM semantics. This
>> +  will help architectures code in making sure that any changes or
>> +  new additions of these helpers will still conform to generic MM
>> +  expected semantics.
> 
> Typos and grammar fixed:
> 
>   help
> This option provides a kernel module which can be used to test
> architecture page table helper functions on various platforms in
> verifying if they comply with expected generic MM semantics. This
> will help architecture code in making sure that any changes or
> new additions of these helpers still conform to expected 
> semantics of the generic MM.

Sure, will update except the 'kernel module' part. Thank you.

> 
> Also, more fundamentally: isn't a kernel module too late for such a debug

Its not a kernel module any more, my bad that the description has still these
words left on from previous versions, will fix it. The test now gets invoked
through a late_initcall().

> check, should something break due to a core MM change? Have these debug 
> checks caught any bugs or inconsistencies before?

Gerald Schaefer had reported earlier about a bug found on s390 with this test.

https://lkml.org/lkml/2019/9/4/1718

> 
> Why not call this as some earlier MM debug check, after enabling paging 
> but before executing user-space binaries or relying on complex MM ops 
> within the kernel, called at a stage when those primitives are all 
> expected to work fine?

At minimum we need buddy allocator to be initialized for the allocations to
work. Just after pgtable_init() or kmem_cache_init() in mm_init() will be a
good place ?

> 
> It seems to me that arch_pgtable_tests_init) won't even context-switch 
> normally, right?

Not sure whether I got this. Why would you expect it to context switch ?

> 
> Finally, instead of inventing yet another randomly named .config debug 
> switch, please fit it into the regular MM debug options which go along 
> the CONFIG_DEBUG_VM* naming scheme.
> 
> Might even make sense to enable these new debug checks by default if 
> CONFIG_DEBUG_VM=y, that way we'll get a *lot* more debug coverage than 
> some random module somewhere that few people will know about, let alone 
> run.

All the configs with respect to memory debugging is generated from
lib/Kconfig.debug after fetching all that is in "mm/Kconfig.debug".
There are only three configs which depend on CONFIG_DEBUG_VM like
a package.

1. CONFIG_DEBUG_VM_VMACACHE
2. CONFIG_DEBUG_VM_RB
3. CONFIG_DEBUG_VM_PGFLAGS
4. CONFIG_DEBUG_VM_PGTABLE [proposed for this]

Before that, just trying to understand whether the reason of making this
arch page table test as part of DEBUG_VM_* package than a just a stand
alone config as many others, is that it is directly related to vi

RE: [PATCH v6 3/3] PCI: layerscape: Add LS1028a support

2019-10-08 Thread Xiaowei Bao


> -Original Message-
> From: Shawn Guo 
> Sent: 2019年10月3日 17:11
> To: Xiaowei Bao 
> Cc: robh...@kernel.org; mark.rutl...@arm.com; Leo Li
> ; M.h. Lian ; Mingkai Hu
> ; Roy Zang ;
> lorenzo.pieral...@arm.com; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org;
> bhelg...@google.com; Z.q. Hou 
> Subject: Re: [PATCH v6 3/3] PCI: layerscape: Add LS1028a support
> 
> On Mon, Sep 02, 2019 at 11:43:19AM +0800, Xiaowei Bao wrote:
> > Add support for the LS1028a PCIe controller.
> >
> > Signed-off-by: Xiaowei Bao 
> > Signed-off-by: Hou Zhiqiang 
> > ---
> > v2:
> >  - No change.
> > v3:
> >  - Reuse the ls2088 driver data structurt.
> > v4:
> >  - No change.
> > v5:
> >  - No change.
> > v6:
> >  - No change.
> >
> >  drivers/pci/controller/dwc/pci-layerscape.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c
> > b/drivers/pci/controller/dwc/pci-layerscape.c
> > index 3a5fa26..f24f79a 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> > @@ -263,6 +263,7 @@ static const struct ls_pcie_drvdata ls2088_drvdata
> > = {  static const struct of_device_id ls_pcie_of_match[] = {
> > { .compatible = "fsl,ls1012a-pcie", .data = &ls1046_drvdata },
> > { .compatible = "fsl,ls1021a-pcie", .data = &ls1021_drvdata },
> > +   { .compatible = "fsl,ls1028a-pcie", .data = &ls2088_drvdata },
> 
> I think you can save this driver change by using "fsl,ls2088a-pcie" as
> compatible fallback like below.
> 
>   compatible = "fsl,ls1028a-pcie", "fsl,ls2088a-pcie";

Yes, it is ok to do so, but according to the previous code, I think add a new 
compatible " fsl,ls1028a-pcie " to driver is better.

Thanks 
Xiaowei

> 
> Shawn
> 
> > { .compatible = "fsl,ls1043a-pcie", .data = &ls1043_drvdata },
> > { .compatible = "fsl,ls1046a-pcie", .data = &ls1046_drvdata },
> > { .compatible = "fsl,ls2080a-pcie", .data = &ls2080_drvdata },
> > --
> > 2.9.5
> >