Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-16 Thread Kees Cook
On Tue, Apr 16, 2019 at 4:04 PM Guenter Roeck  wrote:
>
> On Tue, Apr 16, 2019 at 1:37 PM Dan Williams  wrote:
> > Ah, no, the problem is that jump_label_init() is called by
> > setup_arch() on x86, and smp_prepare_boot_cpu() on powerpc, but not
> > until after parse_args() on ARM.
> >
> Anywhere but arm64, x86, and ppc, really.
>
> $ git grep jump_label_init arch
> arch/arm64/kernel/smp.c:jump_label_init();
> arch/powerpc/lib/feature-fixups.c:  jump_label_init();
> arch/x86/kernel/setup.c:jump_label_init();

Oooh, nice. Yeah, so, this is already a bug for "hardened_usercopy=0"
which sets static branches too.

> > Given it appears to be safe to call jump_label_init() early how about
> > something like the following?
> >
> > diff --git a/init/main.c b/init/main.c
> > index 598e278b46f7..7d4025d665eb 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -582,6 +582,8 @@ asmlinkage __visible void __init start_kernel(void)
> > page_alloc_init();
> >
> > pr_notice("Kernel command line: %s\n", boot_command_line);
> > +   /* parameters may set static keys */
> > +   jump_label_init();
> > parse_early_param();
> > after_dashes = parse_args("Booting kernel",
> >   static_command_line, __start___param,
> > @@ -591,8 +593,6 @@ asmlinkage __visible void __init start_kernel(void)
> > parse_args("Setting init args", after_dashes, NULL, 0, -1, 
> > -1,
> >NULL, set_init_arg);
> >
> > -   jump_label_init();
> > -
>
> That should work, unless there was a reason to have it that late. It
> doesn't look like that was the case, but I may be missing something.

Yes please. :) Let's fix it like you've suggested.

Reviewed-by: Kees Cook 

-- 
Kees Cook


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-16 Thread Guenter Roeck
On Tue, Apr 16, 2019 at 1:37 PM Dan Williams  wrote:
>
> On Tue, Apr 16, 2019 at 12:34 PM Guenter Roeck  wrote:
> >
> > On Tue, Apr 16, 2019 at 11:54 AM Dan Williams  
> > wrote:
> > >
> > > On Thu, Apr 11, 2019 at 1:54 PM Guenter Roeck  wrote:
> > > [..]
> > > > > > Boot tests report
> > > > > >
> > > > > > Qemu test results:
> > > > > > total: 345 pass: 345 fail: 0
> > > > > >
> > > > > > This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> > > > > > and the known crashes fixed.
> > > > >
> > > > > In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
> > > > > kernel command line option "page_alloc.shuffle=1"
> > > > >
> > > > > ...so I doubt you are running with shuffling enabled. Another way to
> > > > > double check is:
> > > > >
> > > > >cat /sys/module/page_alloc/parameters/shuffle
> > > >
> > > > Yes, you are right. Because, with it enabled, I see:
> > > >
> > > > Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1
> > > > console=ttyAMA0,115200 page_alloc.shuffle=1
> > > > [ cut here ]
> > > > WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
> > > > page_alloc_shuffle+0x12c/0x1ac
> > > > static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
> > > > before call to jump_label_init()
> > >
> > > This looks to be specific to ARM never having had to deal with
> > > DEFINE_STATIC_KEY_TRUE in the past.
> > >
> >
> > This affects almost all architectures, not just arm, presumably
> > because parse_args() is called before jump_label_init() in
> > start_kernel().
>
> Hmm, you're right, but this should effect *every* architecture not
> just ARM. Why is it not screaming at me on x86?
>
Guess you figured that out yourself...

> > I did not bother to report back with further details
> > after someone stated that qemu doesn't support omap2, and the context
> > seemed to suggest that running any other tests would not add any
> > value.
> >
> > > I am able to avoid this warning by simply not enabling JUMP_LABEL
> > > support in my build.
> > >
> >
> > Fine with me, as long as CONFIG_SHUFFLE_PAGE_ALLOCATOR=y is not
> > enabled by default, or if it is made dependent on !JUMP_LABEL.
>
> Ah, no, the problem is that jump_label_init() is called by
> setup_arch() on x86, and smp_prepare_boot_cpu() on powerpc, but not
> until after parse_args() on ARM.
>
Anywhere but arm64, x86, and ppc, really.

$ git grep jump_label_init arch
arch/arm64/kernel/smp.c:jump_label_init();
arch/powerpc/lib/feature-fixups.c:  jump_label_init();
arch/x86/kernel/setup.c:jump_label_init();

> Given it appears to be safe to call jump_label_init() early how about
> something like the following?
>
> diff --git a/init/main.c b/init/main.c
> index 598e278b46f7..7d4025d665eb 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -582,6 +582,8 @@ asmlinkage __visible void __init start_kernel(void)
> page_alloc_init();
>
> pr_notice("Kernel command line: %s\n", boot_command_line);
> +   /* parameters may set static keys */
> +   jump_label_init();
> parse_early_param();
> after_dashes = parse_args("Booting kernel",
>   static_command_line, __start___param,
> @@ -591,8 +593,6 @@ asmlinkage __visible void __init start_kernel(void)
> parse_args("Setting init args", after_dashes, NULL, 0, -1, -1,
>NULL, set_init_arg);
>
> -   jump_label_init();
> -

That should work, unless there was a reason to have it that late. It
doesn't look like that was the case, but I may be missing something.

Guenter


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-16 Thread Dan Williams
On Tue, Apr 16, 2019 at 12:34 PM Guenter Roeck  wrote:
>
> On Tue, Apr 16, 2019 at 11:54 AM Dan Williams  
> wrote:
> >
> > On Thu, Apr 11, 2019 at 1:54 PM Guenter Roeck  wrote:
> > [..]
> > > > > Boot tests report
> > > > >
> > > > > Qemu test results:
> > > > > total: 345 pass: 345 fail: 0
> > > > >
> > > > > This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> > > > > and the known crashes fixed.
> > > >
> > > > In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
> > > > kernel command line option "page_alloc.shuffle=1"
> > > >
> > > > ...so I doubt you are running with shuffling enabled. Another way to
> > > > double check is:
> > > >
> > > >cat /sys/module/page_alloc/parameters/shuffle
> > >
> > > Yes, you are right. Because, with it enabled, I see:
> > >
> > > Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1
> > > console=ttyAMA0,115200 page_alloc.shuffle=1
> > > [ cut here ]
> > > WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
> > > page_alloc_shuffle+0x12c/0x1ac
> > > static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
> > > before call to jump_label_init()
> >
> > This looks to be specific to ARM never having had to deal with
> > DEFINE_STATIC_KEY_TRUE in the past.
> >
>
> This affects almost all architectures, not just arm, presumably
> because parse_args() is called before jump_label_init() in
> start_kernel().

Hmm, you're right, but this should effect *every* architecture not
just ARM. Why is it not screaming at me on x86?

> I did not bother to report back with further details
> after someone stated that qemu doesn't support omap2, and the context
> seemed to suggest that running any other tests would not add any
> value.
>
> > I am able to avoid this warning by simply not enabling JUMP_LABEL
> > support in my build.
> >
>
> Fine with me, as long as CONFIG_SHUFFLE_PAGE_ALLOCATOR=y is not
> enabled by default, or if it is made dependent on !JUMP_LABEL.

Ah, no, the problem is that jump_label_init() is called by
setup_arch() on x86, and smp_prepare_boot_cpu() on powerpc, but not
until after parse_args() on ARM.

Given it appears to be safe to call jump_label_init() early how about
something like the following?

diff --git a/init/main.c b/init/main.c
index 598e278b46f7..7d4025d665eb 100644
--- a/init/main.c
+++ b/init/main.c
@@ -582,6 +582,8 @@ asmlinkage __visible void __init start_kernel(void)
page_alloc_init();

pr_notice("Kernel command line: %s\n", boot_command_line);
+   /* parameters may set static keys */
+   jump_label_init();
parse_early_param();
after_dashes = parse_args("Booting kernel",
  static_command_line, __start___param,
@@ -591,8 +593,6 @@ asmlinkage __visible void __init start_kernel(void)
parse_args("Setting init args", after_dashes, NULL, 0, -1, -1,
   NULL, set_init_arg);

-   jump_label_init();
-
/*
 * These use large bootmem allocations and must precede
 * kmem_cache_init()


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-16 Thread Mathieu Desnoyers
- On Apr 16, 2019, at 2:54 PM, Dan Williams dan.j.willi...@intel.com wrote:

> On Thu, Apr 11, 2019 at 1:54 PM Guenter Roeck  wrote:
> [..]
>> > > Boot tests report
>> > >
>> > > Qemu test results:
>> > > total: 345 pass: 345 fail: 0
>> > >
>> > > This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
>> > > and the known crashes fixed.
>> >
>> > In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
>> > kernel command line option "page_alloc.shuffle=1"
>> >
>> > ...so I doubt you are running with shuffling enabled. Another way to
>> > double check is:
>> >
>> >cat /sys/module/page_alloc/parameters/shuffle
>>
>> Yes, you are right. Because, with it enabled, I see:
>>
>> Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1
>> console=ttyAMA0,115200 page_alloc.shuffle=1
>> [ cut here ]
>> WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
>> page_alloc_shuffle+0x12c/0x1ac
>> static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
>> before call to jump_label_init()
> 
> This looks to be specific to ARM never having had to deal with
> DEFINE_STATIC_KEY_TRUE in the past.
> 
> I am able to avoid this warning by simply not enabling JUMP_LABEL
> support in my build.

Looking into this some more, it looks like I was on the wrong track
with my large branch offset theory. Is it just possible that
page_alloc_shuffle() ends up using jump labels before they are
initialized ? Perhaps this has something to do with how early
the page_alloc.shuffle=1 kernel parameter is handled.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-16 Thread Mathieu Desnoyers
- On Apr 16, 2019, at 3:25 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Apr 16, 2019, at 3:17 PM, Mathieu Desnoyers
> mathieu.desnoy...@efficios.com wrote:
> 
>> - On Apr 16, 2019, at 2:54 PM, Dan Williams dan.j.willi...@intel.com 
>> wrote:
>> 
>>> On Thu, Apr 11, 2019 at 1:54 PM Guenter Roeck  wrote:
>>> [..]
 > > Boot tests report
 > >
 > > Qemu test results:
 > > total: 345 pass: 345 fail: 0
 > >
 > > This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
 > > and the known crashes fixed.
 >
 > In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
 > kernel command line option "page_alloc.shuffle=1"
 >
 > ...so I doubt you are running with shuffling enabled. Another way to
 > double check is:
 >
 >cat /sys/module/page_alloc/parameters/shuffle

 Yes, you are right. Because, with it enabled, I see:

 Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1
 console=ttyAMA0,115200 page_alloc.shuffle=1
 [ cut here ]
 WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
 page_alloc_shuffle+0x12c/0x1ac
 static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
 before call to jump_label_init()
>>> 
>>> This looks to be specific to ARM never having had to deal with
>>> DEFINE_STATIC_KEY_TRUE in the past.
>>> 
>>> I am able to avoid this warning by simply not enabling JUMP_LABEL
>>> support in my build.
>> 
>> How large is your kernel image in memory ? Is it larger than 32MB
>> by any chance ?
>> 
>> On arm, the arch_static_branch() uses a "nop" instruction, which seems
>> fine. However, I have a concern wrt arch_static_branch_jump():
>> 
>> arch/arm/include/asm/jump_label.h defines:
>> 
>> static __always_inline bool arch_static_branch_jump(struct static_key *key, 
>> bool
>> branch)
>> {
>>asm_volatile_goto("1:\n\t"
>> WASM(b) " %l[l_yes]\n\t"
>> ".pushsection __jump_table,  \"aw\"\n\t"
>> ".word 1b, %l[l_yes], %c0\n\t"
>> ".popsection\n\t"
>> : :  "i" (&((char *)key)[branch]) :  : l_yes);
>> 
>>return false;
>> l_yes:
>>return true;
>> }
>> 
>> Which should work fine as long as the branch target is within +/-32MB range 
>> of
>> the branch instruction. However, based on
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489e/Cihfddaf.html
>> :
>> 
>> "Extending branch ranges
>> 
>> Machine-level B and BL instructions have restricted ranges from the address 
>> of
>> the current instruction. However, you can use these instructions even if 
>> label
>> is out of range. Often you do not know where the linker places label. When
>> necessary, the linker adds code to enable longer branches. The added code is
>> called a veneer."
>> 
>> So if by an odd chance this branch is turned into a longer branch by the 
>> linker,
>> then
>> the code pattern would be completely unexpected by 
>> arch/arm/kernel/jump_label.c.
>> 
>> Can you try with the following (untested) patch ?
> 

Updated logic of arch_static_branch_jump, and adding change that covers 
arch_static_branch()
as well (untested):

diff --git a/arch/arm/include/asm/jump_label.h 
b/arch/arm/include/asm/jump_label.h
index e12d7d096fc0..cec2f8a2b65e 100644
--- a/arch/arm/include/asm/jump_label.h
+++ b/arch/arm/include/asm/jump_label.h
@@ -9,12 +9,21 @@
 
 #define JUMP_LABEL_NOP_SIZE 4
 
+/*
+ * The linker adds veneer code if target of the branch is beyond +/-32MB
+ * range (+/-16MB for THUMB2), so ensure we never patch a branch
+ * instruction which target is outside of the inline asm.
+ */
 static __always_inline bool arch_static_branch(struct static_key *key, bool 
branch)
 {
asm_volatile_goto("1:\n\t"
 WASM(nop) "\n\t"
+WASM(b) "2f\n\t"
+   "3:\n\t"
+WASM(b) " %l[l_yes]\n\t"
+   "2:\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".word 1b, %l[l_yes], %c0\n\t"
+".word 1b, 3b, %c0\n\t"
 ".popsection\n\t"
 : :  "i" (&((char *)key)[branch]) :  : l_yes);
 
@@ -23,12 +32,21 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
return true;
 }
 
+/*
+ * The linker adds veneer code if target of the branch is beyond +/-32MB
+ * range (+/-16MB for THUMB2), so ensure we never patch a branch
+ * instruction which target is outside of the inline asm.
+ */
 static __always_inline bool arch_static_branch_jump(struct static_key *key, 
bool branch)
 {
asm_volatile_goto("1:\n\t"
+WASM(b) "3f\n\t"
+WASM(b) "2f\n\t"
+   "3:\n\t"
 WASM(b) " %l[l_yes]\n\t"
+   "2:\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".word 1b, %l[l_yes], %c0\n

Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-16 Thread Guenter Roeck
On Tue, Apr 16, 2019 at 11:54 AM Dan Williams  wrote:
>
> On Thu, Apr 11, 2019 at 1:54 PM Guenter Roeck  wrote:
> [..]
> > > > Boot tests report
> > > >
> > > > Qemu test results:
> > > > total: 345 pass: 345 fail: 0
> > > >
> > > > This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> > > > and the known crashes fixed.
> > >
> > > In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
> > > kernel command line option "page_alloc.shuffle=1"
> > >
> > > ...so I doubt you are running with shuffling enabled. Another way to
> > > double check is:
> > >
> > >cat /sys/module/page_alloc/parameters/shuffle
> >
> > Yes, you are right. Because, with it enabled, I see:
> >
> > Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1
> > console=ttyAMA0,115200 page_alloc.shuffle=1
> > [ cut here ]
> > WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
> > page_alloc_shuffle+0x12c/0x1ac
> > static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
> > before call to jump_label_init()
>
> This looks to be specific to ARM never having had to deal with
> DEFINE_STATIC_KEY_TRUE in the past.
>

This affects almost all architectures, not just arm, presumably
because parse_args() is called before jump_label_init() in
start_kernel(). I did not bother to report back with further details
after someone stated that qemu doesn't support omap2, and the context
seemed to suggest that running any other tests would not add any
value.

> I am able to avoid this warning by simply not enabling JUMP_LABEL
> support in my build.
>

Fine with me, as long as CONFIG_SHUFFLE_PAGE_ALLOCATOR=y is not
enabled by default, or if it is made dependent on !JUMP_LABEL.

Guenter

> > Modules linked in:
> > CPU: 0 PID: 0 Comm: swapper Not tainted
> > 5.1.0-rc4-next-20190410-3-g3367c36ce744 #1
> > Hardware name: ARM Integrator/CP (Device Tree)
> > [] (unwind_backtrace) from [] (show_stack+0x10/0x18)
> > [] (show_stack) from [] (dump_stack+0x18/0x24)
> > [] (dump_stack) from [] (__warn+0xe0/0x108)
> > [] (__warn) from [] (warn_slowpath_fmt+0x44/0x6c)
> > [] (warn_slowpath_fmt) from []
> > (page_alloc_shuffle+0x12c/0x1ac)
> > [] (page_alloc_shuffle) from [] 
> > (shuffle_store+0x28/0x48)
> > [] (shuffle_store) from [] (parse_args+0x1f4/0x350)
> > [] (parse_args) from [] (start_kernel+0x1c0/0x488)
> > [] (start_kernel) from [<>] (  (null))
> >
> > I'll re-run the test, but I suspect it will drown in warnings.
>
> I slogged through getting a Beagle Bone Black up and running with a
> Yocto build and it is not failing. I have tried apply the patches on
> top of v5.1-rc5 as well as re-testing next-20190215 label, no
> reproduction. The shuffle appears to avoid anything sensitive by
> default, below are the shuffle actions that were taken relative to
> iomem. Can someone with a failure reproduction please send me more
> details about their configuration? It would also help to get a failing
> boot log with the pr_debug() statements in mm/shuffle.c enabled to see
> if the failure is correlated with any unexpected shuffle actions.
>
> 8000-9fff : System RAM
>   80008000-809f : Kernel code
>   80b0-812be523 : Kernel data
>
> [0.086469] __shuffle_zone: swap: 0x81800 -> 0x99800
> [0.086558] __shuffle_zone: swap: 0x82000 -> 0x88800
> [0.086575] __shuffle_zone: swap: 0x82800 -> 0x89800
> [0.086591] __shuffle_zone: swap: 0x83000 -> 0x89000
> [0.086606] __shuffle_zone: swap: 0x83800 -> 0x8a800
> [0.086621] __shuffle_zone: swap: 0x84000 -> 0x93800
> [0.086636] __shuffle_zone: swap: 0x84800 -> 0x83000
> [0.086651] __shuffle_zone: swap: 0x85000 -> 0x8f000
> [0.08] __shuffle_zone: swap: 0x85800 -> 0x88000
> [0.086689] __shuffle_zone: swap: 0x86000 -> 0x84000
> [0.086704] __shuffle_zone: swap: 0x86800 -> 0x8c800
> [0.086719] __shuffle_zone: swap: 0x87000 -> 0x93000
> [0.086735] __shuffle_zone: swap: 0x87800 -> 0x94000
> [0.086751] __shuffle_zone: swap: 0x88000 -> 0x90800
> [0.086766] __shuffle_zone: swap: 0x88800 -> 0x9d000
> [0.086781] __shuffle_zone: swap: 0x89000 -> 0x82800
> [0.086796] __shuffle_zone: swap: 0x89800 -> 0x95800
> [0.086811] __shuffle_zone: swap: 0x8a000 -> 0x98000
> [0.086826] __shuffle_zone: swap: 0x8a800 -> 0x89000
> [0.086842] __shuffle_zone: swap: 0x8b000 -> 0x81800
> [0.086857] __shuffle_zone: swap: 0x8b800 -> 0x88800
> [0.086872] __shuffle_zone: swap: 0x8c000 -> 0x8a000
> [0.086891] __shuffle_zone: swap: 0x8c800 -> 0x84800
> [0.086906] __shuffle_zone: swap: 0x8d000 -> 0x95000
> [0.086921] __shuffle_zone: swap: 0x8d800 -> 0x8d000
> [0.086935] __shuffle_zone: swap: 0x8e000 -> 0x8e800
> [0.086950] __shuffle_zone: swap: 0x8e800 -> 0x99000
> [0.086964] __shuffle_zone: swap: 0x8f000 -> 0x8d000
> [0.086979] __shuffle_zone: swap: 0x9 -> 0x91000
> [0.086994] __shuffle_zone: swap: 0x90800 -> 0x83000
> [0.087009] __shu

Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-16 Thread Mathieu Desnoyers
- On Apr 16, 2019, at 3:17 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Apr 16, 2019, at 2:54 PM, Dan Williams dan.j.willi...@intel.com 
> wrote:
> 
>> On Thu, Apr 11, 2019 at 1:54 PM Guenter Roeck  wrote:
>> [..]
>>> > > Boot tests report
>>> > >
>>> > > Qemu test results:
>>> > > total: 345 pass: 345 fail: 0
>>> > >
>>> > > This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
>>> > > and the known crashes fixed.
>>> >
>>> > In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
>>> > kernel command line option "page_alloc.shuffle=1"
>>> >
>>> > ...so I doubt you are running with shuffling enabled. Another way to
>>> > double check is:
>>> >
>>> >cat /sys/module/page_alloc/parameters/shuffle
>>>
>>> Yes, you are right. Because, with it enabled, I see:
>>>
>>> Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1
>>> console=ttyAMA0,115200 page_alloc.shuffle=1
>>> [ cut here ]
>>> WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
>>> page_alloc_shuffle+0x12c/0x1ac
>>> static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
>>> before call to jump_label_init()
>> 
>> This looks to be specific to ARM never having had to deal with
>> DEFINE_STATIC_KEY_TRUE in the past.
>> 
>> I am able to avoid this warning by simply not enabling JUMP_LABEL
>> support in my build.
> 
> How large is your kernel image in memory ? Is it larger than 32MB
> by any chance ?
> 
> On arm, the arch_static_branch() uses a "nop" instruction, which seems
> fine. However, I have a concern wrt arch_static_branch_jump():
> 
> arch/arm/include/asm/jump_label.h defines:
> 
> static __always_inline bool arch_static_branch_jump(struct static_key *key, 
> bool
> branch)
> {
>asm_volatile_goto("1:\n\t"
> WASM(b) " %l[l_yes]\n\t"
> ".pushsection __jump_table,  \"aw\"\n\t"
> ".word 1b, %l[l_yes], %c0\n\t"
> ".popsection\n\t"
> : :  "i" (&((char *)key)[branch]) :  : l_yes);
> 
>return false;
> l_yes:
>return true;
> }
> 
> Which should work fine as long as the branch target is within +/-32MB range of
> the branch instruction. However, based on
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489e/Cihfddaf.html
> :
> 
> "Extending branch ranges
> 
> Machine-level B and BL instructions have restricted ranges from the address of
> the current instruction. However, you can use these instructions even if label
> is out of range. Often you do not know where the linker places label. When
> necessary, the linker adds code to enable longer branches. The added code is
> called a veneer."
> 
> So if by an odd chance this branch is turned into a longer branch by the 
> linker,
> then
> the code pattern would be completely unexpected by 
> arch/arm/kernel/jump_label.c.
> 
> Can you try with the following (untested) patch ?

The logic in my previous patch was bogus. Here is an updated version (untested):

diff --git a/arch/arm/include/asm/jump_label.h 
b/arch/arm/include/asm/jump_label.h
index e12d7d096fc0..7c35f57b72c5 100644
--- a/arch/arm/include/asm/jump_label.h
+++ b/arch/arm/include/asm/jump_label.h
@@ -23,12 +23,21 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
return true;
 }
 
+/*
+ * The linker adds veneer code if target of the branch is beyond +/-32MB
+ * range, so ensure we never patch a branch instruction which target is
+ * outside of the inline asm.
+ */
 static __always_inline bool arch_static_branch_jump(struct static_key *key, 
bool branch)
 {
asm_volatile_goto("1:\n\t"
+WASM(nop) "\n\t"
+WASM(b) "2f\n\t"
+   "3:\n\t"
 WASM(b) " %l[l_yes]\n\t"
+   "2:\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".word 1b, %l[l_yes], %c0\n\t"
+".word 1b, 3b, %c0\n\t"
 ".popsection\n\t"
 : :  "i" (&((char *)key)[branch]) :  : l_yes);

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-16 Thread Mathieu Desnoyers



- On Apr 16, 2019, at 2:54 PM, Dan Williams dan.j.willi...@intel.com wrote:

> On Thu, Apr 11, 2019 at 1:54 PM Guenter Roeck  wrote:
> [..]
>> > > Boot tests report
>> > >
>> > > Qemu test results:
>> > > total: 345 pass: 345 fail: 0
>> > >
>> > > This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
>> > > and the known crashes fixed.
>> >
>> > In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
>> > kernel command line option "page_alloc.shuffle=1"
>> >
>> > ...so I doubt you are running with shuffling enabled. Another way to
>> > double check is:
>> >
>> >cat /sys/module/page_alloc/parameters/shuffle
>>
>> Yes, you are right. Because, with it enabled, I see:
>>
>> Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1
>> console=ttyAMA0,115200 page_alloc.shuffle=1
>> [ cut here ]
>> WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
>> page_alloc_shuffle+0x12c/0x1ac
>> static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
>> before call to jump_label_init()
> 
> This looks to be specific to ARM never having had to deal with
> DEFINE_STATIC_KEY_TRUE in the past.
> 
> I am able to avoid this warning by simply not enabling JUMP_LABEL
> support in my build.

How large is your kernel image in memory ? Is it larger than 32MB
by any chance ?

On arm, the arch_static_branch() uses a "nop" instruction, which seems
fine. However, I have a concern wrt arch_static_branch_jump():

arch/arm/include/asm/jump_label.h defines:

static __always_inline bool arch_static_branch_jump(struct static_key *key, 
bool branch)
{
asm_volatile_goto("1:\n\t"
 WASM(b) " %l[l_yes]\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
 ".word 1b, %l[l_yes], %c0\n\t"
 ".popsection\n\t"
 : :  "i" (&((char *)key)[branch]) :  : l_yes);

return false;
l_yes:
return true;
}

Which should work fine as long as the branch target is within +/-32MB range of
the branch instruction. However, based on 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489e/Cihfddaf.html
 :

"Extending branch ranges

Machine-level B and BL instructions have restricted ranges from the address of 
the current instruction. However, you can use these instructions even if label 
is out of range. Often you do not know where the linker places label. When 
necessary, the linker adds code to enable longer branches. The added code is 
called a veneer."

So if by an odd chance this branch is turned into a longer branch by the 
linker, then
the code pattern would be completely unexpected by arch/arm/kernel/jump_label.c.

Can you try with the following (untested) patch ?

diff --git a/arch/arm/include/asm/jump_label.h 
b/arch/arm/include/asm/jump_label.h
index e12d7d096fc0..b183f5bbf2e0 100644
--- a/arch/arm/include/asm/jump_label.h
+++ b/arch/arm/include/asm/jump_label.h
@@ -23,12 +23,18 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
return true;
 }
 
+/*
+ * The linker adds veneer code if target of the branch is beyond +/-32MB
+ * range, so ensure we never patch a branch instruction.
+ */
 static __always_inline bool arch_static_branch_jump(struct static_key *key, 
bool branch)
 {
asm_volatile_goto("1:\n\t"
+WASM(nop) "\n\t"
 WASM(b) " %l[l_yes]\n\t"
+   "2:\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-".word 1b, %l[l_yes], %c0\n\t"
+".word 1b, 2b, %c0\n\t"
 ".popsection\n\t"
 : :  "i" (&((char *)key)[branch]) :  : l_yes);

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-16 Thread Dan Williams
On Thu, Apr 11, 2019 at 1:54 PM Guenter Roeck  wrote:
[..]
> > > Boot tests report
> > >
> > > Qemu test results:
> > > total: 345 pass: 345 fail: 0
> > >
> > > This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> > > and the known crashes fixed.
> >
> > In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
> > kernel command line option "page_alloc.shuffle=1"
> >
> > ...so I doubt you are running with shuffling enabled. Another way to
> > double check is:
> >
> >cat /sys/module/page_alloc/parameters/shuffle
>
> Yes, you are right. Because, with it enabled, I see:
>
> Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1
> console=ttyAMA0,115200 page_alloc.shuffle=1
> [ cut here ]
> WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
> page_alloc_shuffle+0x12c/0x1ac
> static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
> before call to jump_label_init()

This looks to be specific to ARM never having had to deal with
DEFINE_STATIC_KEY_TRUE in the past.

I am able to avoid this warning by simply not enabling JUMP_LABEL
support in my build.

> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted
> 5.1.0-rc4-next-20190410-3-g3367c36ce744 #1
> Hardware name: ARM Integrator/CP (Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x18)
> [] (show_stack) from [] (dump_stack+0x18/0x24)
> [] (dump_stack) from [] (__warn+0xe0/0x108)
> [] (__warn) from [] (warn_slowpath_fmt+0x44/0x6c)
> [] (warn_slowpath_fmt) from []
> (page_alloc_shuffle+0x12c/0x1ac)
> [] (page_alloc_shuffle) from [] (shuffle_store+0x28/0x48)
> [] (shuffle_store) from [] (parse_args+0x1f4/0x350)
> [] (parse_args) from [] (start_kernel+0x1c0/0x488)
> [] (start_kernel) from [<>] (  (null))
>
> I'll re-run the test, but I suspect it will drown in warnings.

I slogged through getting a Beagle Bone Black up and running with a
Yocto build and it is not failing. I have tried apply the patches on
top of v5.1-rc5 as well as re-testing next-20190215 label, no
reproduction. The shuffle appears to avoid anything sensitive by
default, below are the shuffle actions that were taken relative to
iomem. Can someone with a failure reproduction please send me more
details about their configuration? It would also help to get a failing
boot log with the pr_debug() statements in mm/shuffle.c enabled to see
if the failure is correlated with any unexpected shuffle actions.

8000-9fff : System RAM
  80008000-809f : Kernel code
  80b0-812be523 : Kernel data

[0.086469] __shuffle_zone: swap: 0x81800 -> 0x99800
[0.086558] __shuffle_zone: swap: 0x82000 -> 0x88800
[0.086575] __shuffle_zone: swap: 0x82800 -> 0x89800
[0.086591] __shuffle_zone: swap: 0x83000 -> 0x89000
[0.086606] __shuffle_zone: swap: 0x83800 -> 0x8a800
[0.086621] __shuffle_zone: swap: 0x84000 -> 0x93800
[0.086636] __shuffle_zone: swap: 0x84800 -> 0x83000
[0.086651] __shuffle_zone: swap: 0x85000 -> 0x8f000
[0.08] __shuffle_zone: swap: 0x85800 -> 0x88000
[0.086689] __shuffle_zone: swap: 0x86000 -> 0x84000
[0.086704] __shuffle_zone: swap: 0x86800 -> 0x8c800
[0.086719] __shuffle_zone: swap: 0x87000 -> 0x93000
[0.086735] __shuffle_zone: swap: 0x87800 -> 0x94000
[0.086751] __shuffle_zone: swap: 0x88000 -> 0x90800
[0.086766] __shuffle_zone: swap: 0x88800 -> 0x9d000
[0.086781] __shuffle_zone: swap: 0x89000 -> 0x82800
[0.086796] __shuffle_zone: swap: 0x89800 -> 0x95800
[0.086811] __shuffle_zone: swap: 0x8a000 -> 0x98000
[0.086826] __shuffle_zone: swap: 0x8a800 -> 0x89000
[0.086842] __shuffle_zone: swap: 0x8b000 -> 0x81800
[0.086857] __shuffle_zone: swap: 0x8b800 -> 0x88800
[0.086872] __shuffle_zone: swap: 0x8c000 -> 0x8a000
[0.086891] __shuffle_zone: swap: 0x8c800 -> 0x84800
[0.086906] __shuffle_zone: swap: 0x8d000 -> 0x95000
[0.086921] __shuffle_zone: swap: 0x8d800 -> 0x8d000
[0.086935] __shuffle_zone: swap: 0x8e000 -> 0x8e800
[0.086950] __shuffle_zone: swap: 0x8e800 -> 0x99000
[0.086964] __shuffle_zone: swap: 0x8f000 -> 0x8d000
[0.086979] __shuffle_zone: swap: 0x9 -> 0x91000
[0.086994] __shuffle_zone: swap: 0x90800 -> 0x83000
[0.087009] __shuffle_zone: swap: 0x91000 -> 0x91800
[0.087025] __shuffle_zone: swap: 0x91800 -> 0x8d800
[0.087040] __shuffle_zone: swap: 0x92000 -> 0x86800
[0.087054] __shuffle_zone: swap: 0x92800 -> 0x92000
[0.087070] __shuffle_zone: swap: 0x93000 -> 0x91000
[0.087088] __shuffle_zone: swap: 0x93800 -> 0x85000
[0.087103] __shuffle_zone: swap: 0x94000 -> 0x8b800
[0.087117] __shuffle_zone: swap: 0x94800 -> 0x96000
[0.087132] __shuffle_zone: swap: 0x95000 -> 0x91000
[0.087147] __shuffle_zone: swap: 0x95800 -> 0x8e000
[0.087161] __shuffle_zone: swap: 0x96000 -> 0x95800
[0.087179] __shuffle_zone: swap: 0x96800 -> 0x8c800
[0.087193] __shuffle_zone: swap: 0x97000 -> 0x89000
[0.087208

Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-11 Thread Guenter Roeck
On Thu, Apr 11, 2019 at 1:22 PM Dan Williams  wrote:
>
> On Thu, Apr 11, 2019 at 1:08 PM Guenter Roeck  wrote:
> >
> > On Thu, Apr 11, 2019 at 10:35 AM Kees Cook  wrote:
> > >
> > > On Thu, Apr 11, 2019 at 9:42 AM Guenter Roeck  wrote:
> > > >
> > > > On Thu, Apr 11, 2019 at 9:19 AM Kees Cook  wrote:
> > > > >
> > > > > On Thu, Mar 7, 2019 at 7:43 AM Dan Williams 
> > > > >  wrote:
> > > > > > I went ahead and acquired one of these boards to see if I can can
> > > > > > debug this locally.
> > > > >
> > > > > Hi! Any progress on this? Might it be possible to unblock this series
> > > > > for v5.2 by adding a temporary "not on ARM" flag?
> > > > >
> > > >
> > > > Can someone send me a pointer to the series in question ? I would like
> > > > to run it through my testbed.
> > >
> > > It's already in -mm and linux-next (",mm: shuffle initial free memory
> > > to improve memory-side-cache utilization") but it gets enabled with
> > > CONFIG_SHUFFLE_PAGE_ALLOCATOR=y (which was made the default briefly in
> > > -mm which triggered problems on ARM as was reverted).
> > >
> >
> > Boot tests report
> >
> > Qemu test results:
> > total: 345 pass: 345 fail: 0
> >
> > This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> > and the known crashes fixed.
>
> In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
> kernel command line option "page_alloc.shuffle=1"
>
> ...so I doubt you are running with shuffling enabled. Another way to
> double check is:
>
>cat /sys/module/page_alloc/parameters/shuffle

Yes, you are right. Because, with it enabled, I see:

Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1
console=ttyAMA0,115200 page_alloc.shuffle=1
[ cut here ]
WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
page_alloc_shuffle+0x12c/0x1ac
static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
before call to jump_label_init()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted
5.1.0-rc4-next-20190410-3-g3367c36ce744 #1
Hardware name: ARM Integrator/CP (Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x18)
[] (show_stack) from [] (dump_stack+0x18/0x24)
[] (dump_stack) from [] (__warn+0xe0/0x108)
[] (__warn) from [] (warn_slowpath_fmt+0x44/0x6c)
[] (warn_slowpath_fmt) from []
(page_alloc_shuffle+0x12c/0x1ac)
[] (page_alloc_shuffle) from [] (shuffle_store+0x28/0x48)
[] (shuffle_store) from [] (parse_args+0x1f4/0x350)
[] (parse_args) from [] (start_kernel+0x1c0/0x488)
[] (start_kernel) from [<>] (  (null))

I'll re-run the test, but I suspect it will drown in warnings.

Guenter


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-11 Thread Mike Rapoport
On Thu, Apr 11, 2019 at 01:08:15PM -0700, Guenter Roeck wrote:
> On Thu, Apr 11, 2019 at 10:35 AM Kees Cook  wrote:
> >
> > On Thu, Apr 11, 2019 at 9:42 AM Guenter Roeck  wrote:
> > >
> > > On Thu, Apr 11, 2019 at 9:19 AM Kees Cook  wrote:
> > > >
> > > > On Thu, Mar 7, 2019 at 7:43 AM Dan Williams  
> > > > wrote:
> > > > > I went ahead and acquired one of these boards to see if I can can
> > > > > debug this locally.
> > > >
> > > > Hi! Any progress on this? Might it be possible to unblock this series
> > > > for v5.2 by adding a temporary "not on ARM" flag?
> > > >
> > >
> > > Can someone send me a pointer to the series in question ? I would like
> > > to run it through my testbed.
> >
> > It's already in -mm and linux-next (",mm: shuffle initial free memory
> > to improve memory-side-cache utilization") but it gets enabled with
> > CONFIG_SHUFFLE_PAGE_ALLOCATOR=y (which was made the default briefly in
> > -mm which triggered problems on ARM as was reverted).
> >
> 
> Boot tests report
> 
> Qemu test results:
> total: 345 pass: 345 fail: 0
> 
> This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> and the known crashes fixed.
> 
> $ git log --oneline next-20190410..
> 3367c36ce744 Set SHUFFLE_PAGE_ALLOCATOR=y for testing.
> d2aee8b3cd5d Revert "crypto: scompress - Use per-CPU struct instead
> multiple variables"
> 4bc9f5bc9a84 Fix: rhashtable: use bit_spin_locks to protect hash bucket.
> 
> Boot tests on arm are:
> 
> Building 
> arm:versatilepb:versatile_defconfig:aeabi:pci:scsi:mem128:versatile-pb:rootfs
> ... running  passed
> Building 
> arm:versatilepb:versatile_defconfig:aeabi:pci:mem128:versatile-pb:initrd
> ... running  passed

...

> Building 
> arm:witherspoon-bmc:aspeed_g5_defconfig:notests:aspeed-bmc-opp-witherspoon:initrd
> ... running ... passed
> Building arm:ast2500-evb:aspeed_g5_defconfig:notests:aspeed-ast2500-evb:initrd
> ... running  passed
> Building 
> arm:romulus-bmc:aspeed_g5_defconfig:notests:aspeed-bmc-opp-romulus:initrd
> ... running . passed
> Building arm:mps2-an385:mps2_defconfig:mps2-an385:initrd ... running
> .. passed

The issue was with an omap2 board and, AFAIK, qemu does not simulate those.

-- 
Sincerely yours,
Mike.



Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-11 Thread Dan Williams
On Thu, Apr 11, 2019 at 1:08 PM Guenter Roeck  wrote:
>
> On Thu, Apr 11, 2019 at 10:35 AM Kees Cook  wrote:
> >
> > On Thu, Apr 11, 2019 at 9:42 AM Guenter Roeck  wrote:
> > >
> > > On Thu, Apr 11, 2019 at 9:19 AM Kees Cook  wrote:
> > > >
> > > > On Thu, Mar 7, 2019 at 7:43 AM Dan Williams  
> > > > wrote:
> > > > > I went ahead and acquired one of these boards to see if I can can
> > > > > debug this locally.
> > > >
> > > > Hi! Any progress on this? Might it be possible to unblock this series
> > > > for v5.2 by adding a temporary "not on ARM" flag?
> > > >
> > >
> > > Can someone send me a pointer to the series in question ? I would like
> > > to run it through my testbed.
> >
> > It's already in -mm and linux-next (",mm: shuffle initial free memory
> > to improve memory-side-cache utilization") but it gets enabled with
> > CONFIG_SHUFFLE_PAGE_ALLOCATOR=y (which was made the default briefly in
> > -mm which triggered problems on ARM as was reverted).
> >
>
> Boot tests report
>
> Qemu test results:
> total: 345 pass: 345 fail: 0
>
> This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> and the known crashes fixed.

In addition to CONFIG_SHUFFLE_PAGE_ALLOCATOR=y you also need the
kernel command line option "page_alloc.shuffle=1"

...so I doubt you are running with shuffling enabled. Another way to
double check is:

   cat /sys/module/page_alloc/parameters/shuffle


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-11 Thread Guenter Roeck
On Thu, Apr 11, 2019 at 10:35 AM Kees Cook  wrote:
>
> On Thu, Apr 11, 2019 at 9:42 AM Guenter Roeck  wrote:
> >
> > On Thu, Apr 11, 2019 at 9:19 AM Kees Cook  wrote:
> > >
> > > On Thu, Mar 7, 2019 at 7:43 AM Dan Williams  
> > > wrote:
> > > > I went ahead and acquired one of these boards to see if I can can
> > > > debug this locally.
> > >
> > > Hi! Any progress on this? Might it be possible to unblock this series
> > > for v5.2 by adding a temporary "not on ARM" flag?
> > >
> >
> > Can someone send me a pointer to the series in question ? I would like
> > to run it through my testbed.
>
> It's already in -mm and linux-next (",mm: shuffle initial free memory
> to improve memory-side-cache utilization") but it gets enabled with
> CONFIG_SHUFFLE_PAGE_ALLOCATOR=y (which was made the default briefly in
> -mm which triggered problems on ARM as was reverted).
>

Boot tests report

Qemu test results:
total: 345 pass: 345 fail: 0

This is on top of next-20190410 with CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
and the known crashes fixed.

$ git log --oneline next-20190410..
3367c36ce744 Set SHUFFLE_PAGE_ALLOCATOR=y for testing.
d2aee8b3cd5d Revert "crypto: scompress - Use per-CPU struct instead
multiple variables"
4bc9f5bc9a84 Fix: rhashtable: use bit_spin_locks to protect hash bucket.

Boot tests on arm are:

Building 
arm:versatilepb:versatile_defconfig:aeabi:pci:scsi:mem128:versatile-pb:rootfs
... running  passed
Building 
arm:versatilepb:versatile_defconfig:aeabi:pci:mem128:versatile-pb:initrd
... running  passed
Building arm:versatileab:versatile_defconfig:mem128:versatile-ab:initrd
... running  passed
Building arm:imx25-pdk:imx_v4_v5_defconfig:nonand:mem128:imx25-pdk:initrd
... running  passed
Building arm:kzm:imx_v6_v7_defconfig:nodrm:mem128:initrd ... running
.. passed
Building 
arm:mcimx6ul-evk:imx_v6_v7_defconfig:nodrm:mem256:imx6ul-14x14-evk:initrd
... running .. passed
Building 
arm:mcimx6ul-evk:imx_v6_v7_defconfig:nodrm:sd:mem256:imx6ul-14x14-evk:rootfs
... running .. passed
Building 
arm:vexpress-a9:multi_v7_defconfig:nolocktests:mem128:vexpress-v2p-ca9:initrd
... running  passed
Building 
arm:vexpress-a9:multi_v7_defconfig:nolocktests:sd:mem128:vexpress-v2p-ca9:rootfs
... running  passed
Building 
arm:vexpress-a9:multi_v7_defconfig:nolocktests:virtio-blk:mem128:vexpress-v2p-ca9:rootfs
... running  passed
Building 
arm:vexpress-a15:multi_v7_defconfig:nolocktests:sd:mem128:vexpress-v2p-ca15-tc1:rootfs
... running  passed
Building 
arm:vexpress-a15-a7:multi_v7_defconfig:nolocktests:sd:mem256:vexpress-v2p-ca15_a7:rootfs
... running  passed
Building arm:beagle:multi_v7_defconfig:sd:mem256:omap3-beagle:rootfs
... running  passed
Building arm:beaglexm:multi_v7_defconfig:sd:mem512:omap3-beagle-xm:rootfs
... running ... passed
Building arm:overo:multi_v7_defconfig:sd:mem256:omap3-overo-tobi:rootfs
... running ... passed
Building arm:midway:multi_v7_defconfig:mem2G:ecx-2000:initrd ...
running .. passed
Building arm:sabrelite:multi_v7_defconfig:mem256:imx6dl-sabrelite:initrd
... running  passed
Building arm:mcimx7d-sabre:multi_v7_defconfig:mem256:imx7d-sdb:initrd
... running .. passed
Building arm:xilinx-zynq-a9:multi_v7_defconfig:mem128:zynq-zc702:initrd
... running  passed
Building arm:xilinx-zynq-a9:multi_v7_defconfig:sd:mem128:zynq-zc702:rootfs
... running  passed
Building arm:xilinx-zynq-a9:multi_v7_defconfig:sd:mem128:zynq-zc706:rootfs
... running  passed
Building arm:xilinx-zynq-a9:multi_v7_defconfig:sd:mem128:zynq-zed:rootfs
... running ... passed
Building arm:cubieboard:multi_v7_defconfig:mem128:sun4i-a10-cubieboard:initrd
... running ... passed
Building arm:raspi2:multi_v7_defconfig:bcm2836-rpi-2-b:initrd ...
running .. passed
Building arm:raspi2:multi_v7_defconfig:sd:bcm2836-rpi-2-b:rootfs ...
running .. passed
Building arm:virt:multi_v7_defconfig:virtio-blk:mem512:rootfs ...
running . passed
Building 
arm:smdkc210:exynos_defconfig:cpuidle:nocrypto:mem128:exynos4210-smdkv310:initrd
... running . passed
Building 
arm:realview-pb-a8:realview_defconfig:realview_pb:mem512:arm-realview-pba8:initrd
... running  passed
Building 
arm:realview-pbx-a9:realview_defconfig:realview_pb:arm-realview-pbx-a9:initrd
... running  passed
Building 
arm:realview-eb:realview_defconfig:realview_eb:mem512:arm-realview-eb:initrd
... running  passed
Building 
arm:realview-eb-mpcore:realview_defconfig:realview_eb:mem512:arm-realview-eb-11mp-ctrevb:initrd
... running . passed
Building 
arm:akita:pxa_defconfig:nofdt:nodebug:notests:novirt:nousb:noscsi:initrd
... running . passed
Building 
arm:borzoi:pxa_defconfig:nofdt:nodebug:notests:novirt:nousb:noscsi:initrd
... running . passed
Building 
arm:mainstone:pxa_defconfig:nofdt:nodebug:notests:novirt:nousb:noscsi:

Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-11 Thread Kees Cook
On Thu, Apr 11, 2019 at 9:42 AM Guenter Roeck  wrote:
>
> On Thu, Apr 11, 2019 at 9:19 AM Kees Cook  wrote:
> >
> > On Thu, Mar 7, 2019 at 7:43 AM Dan Williams  
> > wrote:
> > > I went ahead and acquired one of these boards to see if I can can
> > > debug this locally.
> >
> > Hi! Any progress on this? Might it be possible to unblock this series
> > for v5.2 by adding a temporary "not on ARM" flag?
> >
>
> Can someone send me a pointer to the series in question ? I would like
> to run it through my testbed.

It's already in -mm and linux-next (",mm: shuffle initial free memory
to improve memory-side-cache utilization") but it gets enabled with
CONFIG_SHUFFLE_PAGE_ALLOCATOR=y (which was made the default briefly in
-mm which triggered problems on ARM as was reverted).

-- 
Kees Cook


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-11 Thread Guenter Roeck
On Thu, Apr 11, 2019 at 9:19 AM Kees Cook  wrote:
>
> On Thu, Mar 7, 2019 at 7:43 AM Dan Williams  wrote:
> >
> > On Thu, Mar 7, 2019 at 1:17 AM Guillaume Tucker
> >  wrote:
> > >
> > > On 06/03/2019 14:05, Mike Rapoport wrote:
> > > > On Wed, Mar 06, 2019 at 10:14:47AM +, Guillaume Tucker wrote:
> > > >> On 01/03/2019 23:23, Dan Williams wrote:
> > > >>> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
> > > >>>  wrote:
> > > >>>
> > > >>> Is there an early-printk facility that can be turned on to see how far
> > > >>> we get in the boot?
> > > >>
> > > >> Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
> > > >> earlyprintk in the command line.  Here's the result, with the
> > > >> commit cherry picked on top of next-20190304:
> > > >>
> > > >>   https://lava.collabora.co.uk/scheduler/job/1526326
> > > >>
> > > >> [1.379522] ti-sysc 4804a000.target-module: sysc_flags 0222 != 
> > > >> 0022
> > > >> [1.396718] Unable to handle kernel paging request at virtual 
> > > >> address 77bb4003
> > > >> [1.404203] pgd = (ptrval)
> > > >> [1.406971] [77bb4003] *pgd=
> > > >> [1.410650] Internal error: Oops: 5 [#1] ARM
> > > >> [...]
> > > >> [1.672310] [] (clk_hw_create_clk.part.21) from 
> > > >> [] (devm_clk_get+0x4c/0x80)
> > > >> [1.681232] [] (devm_clk_get) from [] 
> > > >> (sysc_probe+0x28c/0xde4)
> > > >>
> > > >> It's always failing at that point in the code.  Also when
> > > >> enabling "debug" on the kernel command line, the issue goes
> > > >> away (exact same binaries etc..):
> > > >>
> > > >>   https://lava.collabora.co.uk/scheduler/job/1526327
> > > >>
> > > >> For the record, here's the branch I've been using:
> > > >>
> > > >>   
> > > >> https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug
> > > >>
> > > >> The board otherwise boots fine with next-20190304 (SMP=n), and
> > > >> also with the patch applied but the shuffle configs set to n.
> > > >>
> > > >>> Were there any boot *successes* on ARM with shuffling enabled? I.e.
> > > >>> clues about what's different about the specific memory setup for
> > > >>> beagle-bone-black.
> > > >>
> > > >> Looking at the KernelCI results from next-20190215, it looks like
> > > >> only the BeagleBone Black with SMP=n failed to boot:
> > > >>
> > > >>   
> > > >> https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/
> > > >>
> > > >> Of course that's not all the ARM boards that exist out there, but
> > > >> it's a fairly large coverage already.
> > > >>
> > > >> As the kernel panic always seems to originate in ti-sysc.c,
> > > >> there's a chance it's only visible on that platform...  I'm doing
> > > >> a KernelCI run now with my test branch to double check that,
> > > >> it'll take a few hours so I'll send an update later if I get
> > > >> anything useful out of it.
> > >
> > > Here's the result, there were a couple of failures but some were
> > > due to infrastructure errors (nyan-big) and I'm not sure about
> > > what was the problem with the meson boards:
> > >
> > >   
> > > https://staging.kernelci.org/boot/all/job/gtucker/branch/kernelci-local/kernel/next-20190304-1-g4f0b547b03da/
> > >
> > > So there's no clear indicator that the shuffle config is causing
> > > any issue on any other platform than the BeagleBone Black.
> > >
> > > >> In the meantime, I'm happy to try out other things with more
> > > >> debug configs turned on or any potential fixes someone might
> > > >> have.
> > > >
> > > > ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe 
> > > > the
> > > > failure has something to do with it...
> > > >
> > > > Guillaume, can you try this patch:
> >
> > Mike, I appreciate the help!
> >
> > >
> > > Sure, it doesn't seem to be fixing the problem though:
> > >
> > >   https://lava.collabora.co.uk/scheduler/job/1527471
> > >
> > > I've added the patch to the same branch based on next-20190304.
> > >
> > > I guess this needs to be debugged a little further to see what
> > > the panic really is about.  I'll see if I can spend a bit more
> > > time on it this week, unless there's any BeagleBone expert
> > > available to help or if someone has another fix to try out.
> >
> > Thanks for the help Guillaume!
> >
> > I went ahead and acquired one of these boards to see if I can can
> > debug this locally.
>
> Hi! Any progress on this? Might it be possible to unblock this series
> for v5.2 by adding a temporary "not on ARM" flag?
>

Can someone send me a pointer to the series in question ? I would like
to run it through my testbed.

Thanks,
Guenter

> Thanks!
>
> --
> Kees Cook
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
>
> View/Reply Online (#350): https://groups.io/g/kernelci/message/350
> Mute This Topic: https://groups.io/mt/30172851/955378
> Group Owner: kernelci+ow...@groups.io
> Unsubscribe: https://groups.io/g/kernelci/unsub  [gro...@google.com]
> -=-=-=-=-=-=-=-=-=-=-=-

Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-04-10 Thread Kees Cook
On Thu, Mar 7, 2019 at 7:43 AM Dan Williams  wrote:
>
> On Thu, Mar 7, 2019 at 1:17 AM Guillaume Tucker
>  wrote:
> >
> > On 06/03/2019 14:05, Mike Rapoport wrote:
> > > On Wed, Mar 06, 2019 at 10:14:47AM +, Guillaume Tucker wrote:
> > >> On 01/03/2019 23:23, Dan Williams wrote:
> > >>> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
> > >>>  wrote:
> > >>>
> > >>> Is there an early-printk facility that can be turned on to see how far
> > >>> we get in the boot?
> > >>
> > >> Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
> > >> earlyprintk in the command line.  Here's the result, with the
> > >> commit cherry picked on top of next-20190304:
> > >>
> > >>   https://lava.collabora.co.uk/scheduler/job/1526326
> > >>
> > >> [1.379522] ti-sysc 4804a000.target-module: sysc_flags 0222 != 
> > >> 0022
> > >> [1.396718] Unable to handle kernel paging request at virtual address 
> > >> 77bb4003
> > >> [1.404203] pgd = (ptrval)
> > >> [1.406971] [77bb4003] *pgd=
> > >> [1.410650] Internal error: Oops: 5 [#1] ARM
> > >> [...]
> > >> [1.672310] [] (clk_hw_create_clk.part.21) from 
> > >> [] (devm_clk_get+0x4c/0x80)
> > >> [1.681232] [] (devm_clk_get) from [] 
> > >> (sysc_probe+0x28c/0xde4)
> > >>
> > >> It's always failing at that point in the code.  Also when
> > >> enabling "debug" on the kernel command line, the issue goes
> > >> away (exact same binaries etc..):
> > >>
> > >>   https://lava.collabora.co.uk/scheduler/job/1526327
> > >>
> > >> For the record, here's the branch I've been using:
> > >>
> > >>   
> > >> https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug
> > >>
> > >> The board otherwise boots fine with next-20190304 (SMP=n), and
> > >> also with the patch applied but the shuffle configs set to n.
> > >>
> > >>> Were there any boot *successes* on ARM with shuffling enabled? I.e.
> > >>> clues about what's different about the specific memory setup for
> > >>> beagle-bone-black.
> > >>
> > >> Looking at the KernelCI results from next-20190215, it looks like
> > >> only the BeagleBone Black with SMP=n failed to boot:
> > >>
> > >>   
> > >> https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/
> > >>
> > >> Of course that's not all the ARM boards that exist out there, but
> > >> it's a fairly large coverage already.
> > >>
> > >> As the kernel panic always seems to originate in ti-sysc.c,
> > >> there's a chance it's only visible on that platform...  I'm doing
> > >> a KernelCI run now with my test branch to double check that,
> > >> it'll take a few hours so I'll send an update later if I get
> > >> anything useful out of it.
> >
> > Here's the result, there were a couple of failures but some were
> > due to infrastructure errors (nyan-big) and I'm not sure about
> > what was the problem with the meson boards:
> >
> >   
> > https://staging.kernelci.org/boot/all/job/gtucker/branch/kernelci-local/kernel/next-20190304-1-g4f0b547b03da/
> >
> > So there's no clear indicator that the shuffle config is causing
> > any issue on any other platform than the BeagleBone Black.
> >
> > >> In the meantime, I'm happy to try out other things with more
> > >> debug configs turned on or any potential fixes someone might
> > >> have.
> > >
> > > ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe 
> > > the
> > > failure has something to do with it...
> > >
> > > Guillaume, can you try this patch:
>
> Mike, I appreciate the help!
>
> >
> > Sure, it doesn't seem to be fixing the problem though:
> >
> >   https://lava.collabora.co.uk/scheduler/job/1527471
> >
> > I've added the patch to the same branch based on next-20190304.
> >
> > I guess this needs to be debugged a little further to see what
> > the panic really is about.  I'll see if I can spend a bit more
> > time on it this week, unless there's any BeagleBone expert
> > available to help or if someone has another fix to try out.
>
> Thanks for the help Guillaume!
>
> I went ahead and acquired one of these boards to see if I can can
> debug this locally.

Hi! Any progress on this? Might it be possible to unblock this series
for v5.2 by adding a temporary "not on ARM" flag?

Thanks!

-- 
Kees Cook


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-07 Thread Dan Williams
On Thu, Mar 7, 2019 at 1:17 AM Guillaume Tucker
 wrote:
>
> On 06/03/2019 14:05, Mike Rapoport wrote:
> > On Wed, Mar 06, 2019 at 10:14:47AM +, Guillaume Tucker wrote:
> >> On 01/03/2019 23:23, Dan Williams wrote:
> >>> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
> >>>  wrote:
> >>>
> >>> Is there an early-printk facility that can be turned on to see how far
> >>> we get in the boot?
> >>
> >> Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
> >> earlyprintk in the command line.  Here's the result, with the
> >> commit cherry picked on top of next-20190304:
> >>
> >>   https://lava.collabora.co.uk/scheduler/job/1526326
> >>
> >> [1.379522] ti-sysc 4804a000.target-module: sysc_flags 0222 != 
> >> 0022
> >> [1.396718] Unable to handle kernel paging request at virtual address 
> >> 77bb4003
> >> [1.404203] pgd = (ptrval)
> >> [1.406971] [77bb4003] *pgd=
> >> [1.410650] Internal error: Oops: 5 [#1] ARM
> >> [...]
> >> [1.672310] [] (clk_hw_create_clk.part.21) from [] 
> >> (devm_clk_get+0x4c/0x80)
> >> [1.681232] [] (devm_clk_get) from [] 
> >> (sysc_probe+0x28c/0xde4)
> >>
> >> It's always failing at that point in the code.  Also when
> >> enabling "debug" on the kernel command line, the issue goes
> >> away (exact same binaries etc..):
> >>
> >>   https://lava.collabora.co.uk/scheduler/job/1526327
> >>
> >> For the record, here's the branch I've been using:
> >>
> >>   
> >> https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug
> >>
> >> The board otherwise boots fine with next-20190304 (SMP=n), and
> >> also with the patch applied but the shuffle configs set to n.
> >>
> >>> Were there any boot *successes* on ARM with shuffling enabled? I.e.
> >>> clues about what's different about the specific memory setup for
> >>> beagle-bone-black.
> >>
> >> Looking at the KernelCI results from next-20190215, it looks like
> >> only the BeagleBone Black with SMP=n failed to boot:
> >>
> >>   
> >> https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/
> >>
> >> Of course that's not all the ARM boards that exist out there, but
> >> it's a fairly large coverage already.
> >>
> >> As the kernel panic always seems to originate in ti-sysc.c,
> >> there's a chance it's only visible on that platform...  I'm doing
> >> a KernelCI run now with my test branch to double check that,
> >> it'll take a few hours so I'll send an update later if I get
> >> anything useful out of it.
>
> Here's the result, there were a couple of failures but some were
> due to infrastructure errors (nyan-big) and I'm not sure about
> what was the problem with the meson boards:
>
>   
> https://staging.kernelci.org/boot/all/job/gtucker/branch/kernelci-local/kernel/next-20190304-1-g4f0b547b03da/
>
> So there's no clear indicator that the shuffle config is causing
> any issue on any other platform than the BeagleBone Black.
>
> >> In the meantime, I'm happy to try out other things with more
> >> debug configs turned on or any potential fixes someone might
> >> have.
> >
> > ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe the
> > failure has something to do with it...
> >
> > Guillaume, can you try this patch:

Mike, I appreciate the help!

>
> Sure, it doesn't seem to be fixing the problem though:
>
>   https://lava.collabora.co.uk/scheduler/job/1527471
>
> I've added the patch to the same branch based on next-20190304.
>
> I guess this needs to be debugged a little further to see what
> the panic really is about.  I'll see if I can spend a bit more
> time on it this week, unless there's any BeagleBone expert
> available to help or if someone has another fix to try out.

Thanks for the help Guillaume!

I went ahead and acquired one of these boards to see if I can can
debug this locally.


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-07 Thread Guillaume Tucker
On 06/03/2019 14:05, Mike Rapoport wrote:
> On Wed, Mar 06, 2019 at 10:14:47AM +, Guillaume Tucker wrote:
>> On 01/03/2019 23:23, Dan Williams wrote:
>>> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
>>>  wrote:
>>>
>>> Is there an early-printk facility that can be turned on to see how far
>>> we get in the boot?
>>
>> Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
>> earlyprintk in the command line.  Here's the result, with the
>> commit cherry picked on top of next-20190304:
>>
>>   https://lava.collabora.co.uk/scheduler/job/1526326
>>
>> [1.379522] ti-sysc 4804a000.target-module: sysc_flags 0222 != 
>> 0022
>> [1.396718] Unable to handle kernel paging request at virtual address 
>> 77bb4003
>> [1.404203] pgd = (ptrval)
>> [1.406971] [77bb4003] *pgd=
>> [1.410650] Internal error: Oops: 5 [#1] ARM
>> [...]
>> [1.672310] [] (clk_hw_create_clk.part.21) from [] 
>> (devm_clk_get+0x4c/0x80)
>> [1.681232] [] (devm_clk_get) from [] 
>> (sysc_probe+0x28c/0xde4)
>>
>> It's always failing at that point in the code.  Also when
>> enabling "debug" on the kernel command line, the issue goes
>> away (exact same binaries etc..):
>>
>>   https://lava.collabora.co.uk/scheduler/job/1526327
>>
>> For the record, here's the branch I've been using:
>>
>>   
>> https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug
>>
>> The board otherwise boots fine with next-20190304 (SMP=n), and
>> also with the patch applied but the shuffle configs set to n.
>>
>>> Were there any boot *successes* on ARM with shuffling enabled? I.e.
>>> clues about what's different about the specific memory setup for
>>> beagle-bone-black.
>>
>> Looking at the KernelCI results from next-20190215, it looks like
>> only the BeagleBone Black with SMP=n failed to boot:
>>
>>   https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/
>>
>> Of course that's not all the ARM boards that exist out there, but
>> it's a fairly large coverage already.
>>
>> As the kernel panic always seems to originate in ti-sysc.c,
>> there's a chance it's only visible on that platform...  I'm doing
>> a KernelCI run now with my test branch to double check that,
>> it'll take a few hours so I'll send an update later if I get
>> anything useful out of it.

Here's the result, there were a couple of failures but some were
due to infrastructure errors (nyan-big) and I'm not sure about
what was the problem with the meson boards:

  
https://staging.kernelci.org/boot/all/job/gtucker/branch/kernelci-local/kernel/next-20190304-1-g4f0b547b03da/

So there's no clear indicator that the shuffle config is causing
any issue on any other platform than the BeagleBone Black.

>> In the meantime, I'm happy to try out other things with more
>> debug configs turned on or any potential fixes someone might
>> have.
> 
> ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe the
> failure has something to do with it...
> 
> Guillaume, can you try this patch:

Sure, it doesn't seem to be fixing the problem though:

  https://lava.collabora.co.uk/scheduler/job/1527471

I've added the patch to the same branch based on next-20190304.

I guess this needs to be debugged a little further to see what
the panic really is about.  I'll see if I can spend a bit more
time on it this week, unless there's any BeagleBone expert
available to help or if someone has another fix to try out.

Guillaume

> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index 3ce1248..4a04aac 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -58,7 +58,8 @@ module_param_call(shuffle, shuffle_store, shuffle_show, 
> &shuffle_param, 0400);
>   * For two pages to be swapped in the shuffle, they must be free (on a
>   * 'free_area' lru), have the same order, and have the same migratetype.
>   */
> -static struct page * __meminit shuffle_valid_page(unsigned long pfn, int 
> order)
> +static struct page * __meminit shuffle_valid_page(unsigned long pfn, int 
> order,
> +   struct zone *z)
>  {
>   struct page *page;
>  
> @@ -80,6 +81,9 @@ static struct page * __meminit shuffle_valid_page(unsigned 
> long pfn, int order)
>   if (!PageBuddy(page))
>   return NULL;
>  
> + if (!memmap_valid_within(pfn, page, z))
> + return NULL;
> +
>   /*
>* ...is the page on the same list as the page we will
>* shuffle it with?
> @@ -123,7 +127,7 @@ void __meminit __shuffle_zone(struct zone *z)
>* page_j randomly selected in the span @zone_start_pfn to
>* @spanned_pages.
>*/
> - page_i = shuffle_valid_page(i, order);
> + page_i = shuffle_valid_page(i, order, z);
>   if (!page_i)
>   continue;
>  
> @@ -137,7 +141,7 @@ void __meminit __shuffle_zone(struct zone *z)
>   j = z->zone_start_pfn +
>   

Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-06 Thread Mike Rapoport
On Wed, Mar 06, 2019 at 10:14:47AM +, Guillaume Tucker wrote:
> On 01/03/2019 23:23, Dan Williams wrote:
> > On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
> >  wrote:
> > 
> > Is there an early-printk facility that can be turned on to see how far
> > we get in the boot?
> 
> Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
> earlyprintk in the command line.  Here's the result, with the
> commit cherry picked on top of next-20190304:
> 
>   https://lava.collabora.co.uk/scheduler/job/1526326
> 
> [1.379522] ti-sysc 4804a000.target-module: sysc_flags 0222 != 0022
> [1.396718] Unable to handle kernel paging request at virtual address 
> 77bb4003
> [1.404203] pgd = (ptrval)
> [1.406971] [77bb4003] *pgd=
> [1.410650] Internal error: Oops: 5 [#1] ARM
> [...]
> [1.672310] [] (clk_hw_create_clk.part.21) from [] 
> (devm_clk_get+0x4c/0x80)
> [1.681232] [] (devm_clk_get) from [] 
> (sysc_probe+0x28c/0xde4)
> 
> It's always failing at that point in the code.  Also when
> enabling "debug" on the kernel command line, the issue goes
> away (exact same binaries etc..):
> 
>   https://lava.collabora.co.uk/scheduler/job/1526327
> 
> For the record, here's the branch I've been using:
> 
>   
> https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug
> 
> The board otherwise boots fine with next-20190304 (SMP=n), and
> also with the patch applied but the shuffle configs set to n.
> 
> > Were there any boot *successes* on ARM with shuffling enabled? I.e.
> > clues about what's different about the specific memory setup for
> > beagle-bone-black.
> 
> Looking at the KernelCI results from next-20190215, it looks like
> only the BeagleBone Black with SMP=n failed to boot:
> 
>   https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190215/
> 
> Of course that's not all the ARM boards that exist out there, but
> it's a fairly large coverage already.
> 
> As the kernel panic always seems to originate in ti-sysc.c,
> there's a chance it's only visible on that platform...  I'm doing
> a KernelCI run now with my test branch to double check that,
> it'll take a few hours so I'll send an update later if I get
> anything useful out of it.
> 
> In the meantime, I'm happy to try out other things with more
> debug configs turned on or any potential fixes someone might
> have.

ARM is the only arch that sets ARCH_HAS_HOLES_MEMORYMODEL to 'y'. Maybe the
failure has something to do with it...

Guillaume, can you try this patch:

diff --git a/mm/shuffle.c b/mm/shuffle.c
index 3ce1248..4a04aac 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -58,7 +58,8 @@ module_param_call(shuffle, shuffle_store, shuffle_show, 
&shuffle_param, 0400);
  * For two pages to be swapped in the shuffle, they must be free (on a
  * 'free_area' lru), have the same order, and have the same migratetype.
  */
-static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
+static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order,
+ struct zone *z)
 {
struct page *page;
 
@@ -80,6 +81,9 @@ static struct page * __meminit shuffle_valid_page(unsigned 
long pfn, int order)
if (!PageBuddy(page))
return NULL;
 
+   if (!memmap_valid_within(pfn, page, z))
+   return NULL;
+
/*
 * ...is the page on the same list as the page we will
 * shuffle it with?
@@ -123,7 +127,7 @@ void __meminit __shuffle_zone(struct zone *z)
 * page_j randomly selected in the span @zone_start_pfn to
 * @spanned_pages.
 */
-   page_i = shuffle_valid_page(i, order);
+   page_i = shuffle_valid_page(i, order, z);
if (!page_i)
continue;
 
@@ -137,7 +141,7 @@ void __meminit __shuffle_zone(struct zone *z)
j = z->zone_start_pfn +
ALIGN_DOWN(get_random_long() % z->spanned_pages,
order_pages);
-   page_j = shuffle_valid_page(j, order);
+   page_j = shuffle_valid_page(j, order, z);
if (page_j && page_j != page_i)
break;
}
 

-- 
Sincerely yours,
Mike.



Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-06 Thread Guillaume Tucker
On 01/03/2019 23:23, Dan Williams wrote:
> On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
>  wrote:
>>
>> On 01/03/2019 20:41, Andrew Morton wrote:
>>> On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker 
>>>  wrote:
>>>
>>> Michal had asked if the free space accounting fix up addressed this
>>> boot regression? I was awaiting word on that.
>>
>> hm, does b...@kernelci.org actually read emails?  Let's try info@ as 
>> well..

 b...@kernelci.org is not person, it's a send-only account for
 automated reports.  So no, it doesn't read emails.

 I guess the tricky point here is that the authors of the commits
 found by bisections may not always have the hardware needed to
 reproduce the problem.  So it needs to be dealt with on a
 case-by-case basis: sometimes they do have the hardware,
 sometimes someone else on the list or on CC does, and sometimes
 it's better for the people who have access to the test lab which
 ran the KernelCI test to deal with it.

 This case seems to fall into the last category.  As I have access
 to the Collabora lab, I can do some quick checks to confirm
 whether the proposed patch does fix the issue.  I hadn't realised
 that someone was waiting for this to happen, especially as the
 BeagleBone Black is a very common platform.  Sorry about that,
 I'll take a look today.

 It may be a nice feature to be able to give access to the
 KernelCI test infrastructure to anyone who wants to debug an
 issue reported by KernelCI or verify a fix, so they won't need to
 have the hardware locally.  Something to think about for the
 future.
>>>
>>> Thanks, that all sounds good.
>>>
>> Is it possible to determine whether this regression is still present in
>> current linux-next?

 I'll try to re-apply the patch that caused the issue, then see if
 the suggested change fixes it.  As far as the current linux-next
 master branch is concerned, KernelCI boot tests are passing fine
 on that platform.
>>>
>>> They would, because I dropped
>>> mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably
>>> now have shuffling disabled.
>>>
>>> Is it possible to add the below to linux-next and try again?
>>
>> I've actually already done that, and essentially the issue can
>> still be reproduced by applying that patch.  See this branch:
>>
>>   
>> https://gitlab.collabora.com/gtucker/linux/commits/next-20190301-beaglebone-black-debug
>>
>> next-20190301 boots fine but the head fails, using
>> multi_v7_defconfig + SMP=n in both cases and
>> SHUFFLE_PAGE_ALLOCATOR=y enabled in the 2nd case as a result
>> of the change in the default value.
>>
>> The change suggested by Michal Hocko on Feb 15th has now been
>> applied in linux-next, it's part of this commit but as
>> explained above it does not actually resolve the boot failure:
>>
>>   98cf198ee8ce mm: move buddy list manipulations into helpers
>>
>> I can send more details on Monday and do a bit of debugging to
>> help narrowing down the problem.  Please let me know if
>> there's anything in particular that would seem be worth
>> trying.
>>
> 
> Thanks for taking a look!
> 
> Some questions when you get a chance:
> 
> Is there an early-printk facility that can be turned on to see how far
> we get in the boot?

Yes, I've done that now by enabling CONFIG_DEBUG_AM33XXUART1 and
earlyprintk in the command line.  Here's the result, with the
commit cherry picked on top of next-20190304:

  https://lava.collabora.co.uk/scheduler/job/1526326

[1.379522] ti-sysc 4804a000.target-module: sysc_flags 0222 != 0022
[1.396718] Unable to handle kernel paging request at virtual address 
77bb4003
[1.404203] pgd = (ptrval)
[1.406971] [77bb4003] *pgd=
[1.410650] Internal error: Oops: 5 [#1] ARM
[...]
[1.672310] [] (clk_hw_create_clk.part.21) from [] 
(devm_clk_get+0x4c/0x80)
[1.681232] [] (devm_clk_get) from [] 
(sysc_probe+0x28c/0xde4)

It's always failing at that point in the code.  Also when
enabling "debug" on the kernel command line, the issue goes
away (exact same binaries etc..):

  https://lava.collabora.co.uk/scheduler/job/1526327

For the record, here's the branch I've been using:

  
https://gitlab.collabora.com/gtucker/linux/tree/beaglebone-black-next-20190304-debug

The board otherwise boots fine with next-20190304 (SMP=n), and
also with the patch applied but the shuffle configs set to n.

> Do any of the QEMU machine types [1] approximate this board? I.e. so I
> might be able to independently debug.

Unfortunately there doesn't appear to be any QEMU machine
emulating the TI AM335x SoC or the BeagleBone Black board.

> Were there any boot *successes* on ARM with shuffling enabled? I.e.
> clues about what's different about the specific memory setup for
> beagle-bone-black.

Looking at the KernelCI results from next-20190215, it looks like
only the BeagleBone Black with SMP=n f

Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-01 Thread Dan Williams
On Fri, Mar 1, 2019 at 1:05 PM Guillaume Tucker
 wrote:
>
> On 01/03/2019 20:41, Andrew Morton wrote:
> > On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker 
> >  wrote:
> >
> > Michal had asked if the free space accounting fix up addressed this
> > boot regression? I was awaiting word on that.
> 
>  hm, does b...@kernelci.org actually read emails?  Let's try info@ as 
>  well..
> >>
> >> b...@kernelci.org is not person, it's a send-only account for
> >> automated reports.  So no, it doesn't read emails.
> >>
> >> I guess the tricky point here is that the authors of the commits
> >> found by bisections may not always have the hardware needed to
> >> reproduce the problem.  So it needs to be dealt with on a
> >> case-by-case basis: sometimes they do have the hardware,
> >> sometimes someone else on the list or on CC does, and sometimes
> >> it's better for the people who have access to the test lab which
> >> ran the KernelCI test to deal with it.
> >>
> >> This case seems to fall into the last category.  As I have access
> >> to the Collabora lab, I can do some quick checks to confirm
> >> whether the proposed patch does fix the issue.  I hadn't realised
> >> that someone was waiting for this to happen, especially as the
> >> BeagleBone Black is a very common platform.  Sorry about that,
> >> I'll take a look today.
> >>
> >> It may be a nice feature to be able to give access to the
> >> KernelCI test infrastructure to anyone who wants to debug an
> >> issue reported by KernelCI or verify a fix, so they won't need to
> >> have the hardware locally.  Something to think about for the
> >> future.
> >
> > Thanks, that all sounds good.
> >
>  Is it possible to determine whether this regression is still present in
>  current linux-next?
> >>
> >> I'll try to re-apply the patch that caused the issue, then see if
> >> the suggested change fixes it.  As far as the current linux-next
> >> master branch is concerned, KernelCI boot tests are passing fine
> >> on that platform.
> >
> > They would, because I dropped
> > mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably
> > now have shuffling disabled.
> >
> > Is it possible to add the below to linux-next and try again?
>
> I've actually already done that, and essentially the issue can
> still be reproduced by applying that patch.  See this branch:
>
>   
> https://gitlab.collabora.com/gtucker/linux/commits/next-20190301-beaglebone-black-debug
>
> next-20190301 boots fine but the head fails, using
> multi_v7_defconfig + SMP=n in both cases and
> SHUFFLE_PAGE_ALLOCATOR=y enabled in the 2nd case as a result
> of the change in the default value.
>
> The change suggested by Michal Hocko on Feb 15th has now been
> applied in linux-next, it's part of this commit but as
> explained above it does not actually resolve the boot failure:
>
>   98cf198ee8ce mm: move buddy list manipulations into helpers
>
> I can send more details on Monday and do a bit of debugging to
> help narrowing down the problem.  Please let me know if
> there's anything in particular that would seem be worth
> trying.
>

Thanks for taking a look!

Some questions when you get a chance:

Is there an early-printk facility that can be turned on to see how far
we get in the boot?

Do any of the QEMU machine types [1] approximate this board? I.e. so I
might be able to independently debug.

Were there any boot *successes* on ARM with shuffling enabled? I.e.
clues about what's different about the specific memory setup for
beagle-bone-black.

Thanks for the help!

[1]: https://wiki.qemu.org/Documentation/Platforms/ARM


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-01 Thread Guillaume Tucker
On 01/03/2019 20:41, Andrew Morton wrote:
> On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker 
>  wrote:
> 
> Michal had asked if the free space accounting fix up addressed this
> boot regression? I was awaiting word on that.

 hm, does b...@kernelci.org actually read emails?  Let's try info@ as well..
>>
>> b...@kernelci.org is not person, it's a send-only account for
>> automated reports.  So no, it doesn't read emails.
>>
>> I guess the tricky point here is that the authors of the commits
>> found by bisections may not always have the hardware needed to
>> reproduce the problem.  So it needs to be dealt with on a
>> case-by-case basis: sometimes they do have the hardware,
>> sometimes someone else on the list or on CC does, and sometimes
>> it's better for the people who have access to the test lab which
>> ran the KernelCI test to deal with it.
>>
>> This case seems to fall into the last category.  As I have access
>> to the Collabora lab, I can do some quick checks to confirm
>> whether the proposed patch does fix the issue.  I hadn't realised
>> that someone was waiting for this to happen, especially as the
>> BeagleBone Black is a very common platform.  Sorry about that,
>> I'll take a look today.
>>
>> It may be a nice feature to be able to give access to the
>> KernelCI test infrastructure to anyone who wants to debug an
>> issue reported by KernelCI or verify a fix, so they won't need to
>> have the hardware locally.  Something to think about for the
>> future.
> 
> Thanks, that all sounds good.
> 
 Is it possible to determine whether this regression is still present in
 current linux-next?
>>
>> I'll try to re-apply the patch that caused the issue, then see if
>> the suggested change fixes it.  As far as the current linux-next
>> master branch is concerned, KernelCI boot tests are passing fine
>> on that platform.
> 
> They would, because I dropped
> mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably
> now have shuffling disabled.
> 
> Is it possible to add the below to linux-next and try again?

I've actually already done that, and essentially the issue can
still be reproduced by applying that patch.  See this branch:

  
https://gitlab.collabora.com/gtucker/linux/commits/next-20190301-beaglebone-black-debug

next-20190301 boots fine but the head fails, using
multi_v7_defconfig + SMP=n in both cases and
SHUFFLE_PAGE_ALLOCATOR=y enabled in the 2nd case as a result
of the change in the default value.

The change suggested by Michal Hocko on Feb 15th has now been
applied in linux-next, it's part of this commit but as
explained above it does not actually resolve the boot failure:

  98cf198ee8ce mm: move buddy list manipulations into helpers

I can send more details on Monday and do a bit of debugging to
help narrowing down the problem.  Please let me know if
there's anything in particular that would seem be worth
trying.

> Or I can re-add this to linux-next.  Where should we go to determine
> the results of such a change?  There are a heck of a lot of results on
> https://kernelci.org/boot/ and entering "beaglebone-black" doesn't get
> me anything.

The BeagleBone Black board was offline for a few days in our
lab, which probably explains why you're not getting much
results from the web interface.  Hopefully we'll see passing
boot results in linux-next tomorrow now that the board is back
on track.

It's quite easy for me to submit test jobs with kernels I've
built myself instead of going through the full linux-next and
KernelCI loop.  So that's the best way to try things out, then
when a fix has been found it can be applied in linux-next on
top of the mm/shuffle change to verify it in KernelCI.

Guillaume

> From: Dan Williams 
> Subject: mm/shuffle: default enable all shuffling
> 
> Per Andrew's request arrange for all memory allocation shuffling code to
> be enabled by default.
> 
> The page_alloc.shuffle command line parameter can still be used to disable
> shuffling at boot, but the kernel will default enable the shuffling if the
> command line option is not specified.
> 
> Link: 
> http://lkml.kernel.org/r/154943713572.3858443.11206307988382889377.st...@dwillia2-desk3.amr.corp.intel.com
> Signed-off-by: Dan Williams 
> Cc: Kees Cook 
> Cc: Michal Hocko 
> Cc: Dave Hansen 
> Cc: Keith Busch 
> 
> Signed-off-by: Andrew Morton 
> ---
> 
>  init/Kconfig |4 ++--
>  mm/shuffle.c |4 ++--
>  mm/shuffle.h |2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> --- a/init/Kconfig~mm-shuffle-default-enable-all-shuffling
> +++ a/init/Kconfig
> @@ -1709,7 +1709,7 @@ config SLAB_MERGE_DEFAULT
> command line.
>  
>  config SLAB_FREELIST_RANDOM
> - default n
> + default y
>   depends on SLAB || SLUB
>   bool "SLAB freelist randomization"
>   help
> @@ -1728,7 +1728,7 @@ config SLAB_FREELIST_HARDENED
>  
>  config SHUFFLE_PAGE_ALLOCATOR
>   bool "Page allocator randomization"
> - default SLAB_FREELIST_RANDOM &

Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-01 Thread Andrew Morton
On Fri, 1 Mar 2019 09:25:24 +0100 Guillaume Tucker 
 wrote:

> >>> Michal had asked if the free space accounting fix up addressed this
> >>> boot regression? I was awaiting word on that.
> >>
> >> hm, does b...@kernelci.org actually read emails?  Let's try info@ as well..
> 
> b...@kernelci.org is not person, it's a send-only account for
> automated reports.  So no, it doesn't read emails.
> 
> I guess the tricky point here is that the authors of the commits
> found by bisections may not always have the hardware needed to
> reproduce the problem.  So it needs to be dealt with on a
> case-by-case basis: sometimes they do have the hardware,
> sometimes someone else on the list or on CC does, and sometimes
> it's better for the people who have access to the test lab which
> ran the KernelCI test to deal with it.
> 
> This case seems to fall into the last category.  As I have access
> to the Collabora lab, I can do some quick checks to confirm
> whether the proposed patch does fix the issue.  I hadn't realised
> that someone was waiting for this to happen, especially as the
> BeagleBone Black is a very common platform.  Sorry about that,
> I'll take a look today.
> 
> It may be a nice feature to be able to give access to the
> KernelCI test infrastructure to anyone who wants to debug an
> issue reported by KernelCI or verify a fix, so they won't need to
> have the hardware locally.  Something to think about for the
> future.

Thanks, that all sounds good.

> >> Is it possible to determine whether this regression is still present in
> >> current linux-next?
> 
> I'll try to re-apply the patch that caused the issue, then see if
> the suggested change fixes it.  As far as the current linux-next
> master branch is concerned, KernelCI boot tests are passing fine
> on that platform.

They would, because I dropped
mm-shuffle-default-enable-all-shuffling.patch, so your tests presumably
now have shuffling disabled.

Is it possible to add the below to linux-next and try again?

Or I can re-add this to linux-next.  Where should we go to determine
the results of such a change?  There are a heck of a lot of results on
https://kernelci.org/boot/ and entering "beaglebone-black" doesn't get
me anything.

Thanks.



From: Dan Williams 
Subject: mm/shuffle: default enable all shuffling

Per Andrew's request arrange for all memory allocation shuffling code to
be enabled by default.

The page_alloc.shuffle command line parameter can still be used to disable
shuffling at boot, but the kernel will default enable the shuffling if the
command line option is not specified.

Link: 
http://lkml.kernel.org/r/154943713572.3858443.11206307988382889377.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams 
Cc: Kees Cook 
Cc: Michal Hocko 
Cc: Dave Hansen 
Cc: Keith Busch 

Signed-off-by: Andrew Morton 
---

 init/Kconfig |4 ++--
 mm/shuffle.c |4 ++--
 mm/shuffle.h |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

--- a/init/Kconfig~mm-shuffle-default-enable-all-shuffling
+++ a/init/Kconfig
@@ -1709,7 +1709,7 @@ config SLAB_MERGE_DEFAULT
  command line.
 
 config SLAB_FREELIST_RANDOM
-   default n
+   default y
depends on SLAB || SLUB
bool "SLAB freelist randomization"
help
@@ -1728,7 +1728,7 @@ config SLAB_FREELIST_HARDENED
 
 config SHUFFLE_PAGE_ALLOCATOR
bool "Page allocator randomization"
-   default SLAB_FREELIST_RANDOM && ACPI_NUMA
+   default y
help
  Randomization of the page allocator improves the average
  utilization of a direct-mapped memory-side-cache. See section
--- a/mm/shuffle.c~mm-shuffle-default-enable-all-shuffling
+++ a/mm/shuffle.c
@@ -9,8 +9,8 @@
 #include "internal.h"
 #include "shuffle.h"
 
-DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
-static unsigned long shuffle_state __ro_after_init;
+DEFINE_STATIC_KEY_TRUE(page_alloc_shuffle_key);
+static unsigned long shuffle_state __ro_after_init = 1 << SHUFFLE_ENABLE;
 
 /*
  * Depending on the architecture, module parameter parsing may run
--- a/mm/shuffle.h~mm-shuffle-default-enable-all-shuffling
+++ a/mm/shuffle.h
@@ -19,7 +19,7 @@ enum mm_shuffle_ctl {
 #define SHUFFLE_ORDER (MAX_ORDER-1)
 
 #ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR
-DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
+DECLARE_STATIC_KEY_TRUE(page_alloc_shuffle_key);
 extern void page_alloc_shuffle(enum mm_shuffle_ctl ctl);
 extern void __shuffle_free_memory(pg_data_t *pgdat);
 static inline void shuffle_free_memory(pg_data_t *pgdat)
_



Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-01 Thread Mark Brown
On Fri, Mar 01, 2019 at 12:40:11PM +0200, Mike Rapoport wrote:

> Another thing to consider is adding "earlyprintk debug" to the kernel
> command line for the boot tests.

We probably don't want to do that on all the tests since it does
occasionally change timing enough to "fix" things but doing a final boot
with the failing commit and earlyprintk turned on is definitely a good
idea.


signature.asc
Description: PGP signature


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-01 Thread Mark Brown
On Thu, Feb 28, 2019 at 03:14:38PM -0800, Andrew Morton wrote:

> hm, does b...@kernelci.org actually read emails?  Let's try info@ as well..

bot@ isn't reading mails but it copies people who can look at stuff on
what it sends out.


signature.asc
Description: PGP signature


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-01 Thread Mike Rapoport
On Fri, Mar 01, 2019 at 09:25:24AM +0100, Guillaume Tucker wrote:
> On 01/03/2019 00:55, Dan Williams wrote:
> > On Thu, Feb 28, 2019 at 3:14 PM Andrew Morton  
> > wrote:
> >>
> >> On Tue, 26 Feb 2019 16:04:04 -0800 Dan Williams  
> >> wrote:
> >>
> >>> On Tue, Feb 26, 2019 at 4:00 PM Andrew Morton  
> >>> wrote:
> 
>  On Fri, 15 Feb 2019 18:51:51 + Mark Brown  wrote:
> 
> > On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:
> >> On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
> >>  wrote:
> >
> >>>   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
> >>>   Plain log:  
> >>> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
> >>>   HTML log:   
> >>> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
> >
> >> Thanks.
> >
> >> But what actually went wrong?  Kernel doesn't boot?
> >
> > The linked logs show the kernel dying early in boot before the console
> > comes up so yeah.  There should be kernel output at the bottom of the
> > logs.
> 
>  I assume Dan is distracted - I'll keep this patchset on hold until we
>  can get to the bottom of this.
> >>>
> >>> Michal had asked if the free space accounting fix up addressed this
> >>> boot regression? I was awaiting word on that.
> >>
> >> hm, does b...@kernelci.org actually read emails?  Let's try info@ as well..
> 
> b...@kernelci.org is not person, it's a send-only account for
> automated reports.  So no, it doesn't read emails.
> 
> I guess the tricky point here is that the authors of the commits
> found by bisections may not always have the hardware needed to
> reproduce the problem.  So it needs to be dealt with on a
> case-by-case basis: sometimes they do have the hardware,
> sometimes someone else on the list or on CC does, and sometimes
> it's better for the people who have access to the test lab which
> ran the KernelCI test to deal with it.
> 
> This case seems to fall into the last category.  As I have access
> to the Collabora lab, I can do some quick checks to confirm
> whether the proposed patch does fix the issue.  I hadn't realised
> that someone was waiting for this to happen, especially as the
> BeagleBone Black is a very common platform.  Sorry about that,
> I'll take a look today.
> 
> It may be a nice feature to be able to give access to the
> KernelCI test infrastructure to anyone who wants to debug an
> issue reported by KernelCI or verify a fix, so they won't need to
> have the hardware locally.  Something to think about for the
> future.

Another thing to consider is adding "earlyprintk debug" to the kernel
command line for the boot tests.
 
> >> Is it possible to determine whether this regression is still present in
> >> current linux-next?
> 
> I'll try to re-apply the patch that caused the issue, then see if
> the suggested change fixes it.  As far as the current linux-next
> master branch is concerned, KernelCI boot tests are passing fine
> on that platform.
> 
> Guillaume
> 

-- 
Sincerely yours,
Mike.



Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-01 Thread Vlastimil Babka
On 2/27/19 1:04 AM, Dan Williams wrote:
> On Tue, Feb 26, 2019 at 4:00 PM Andrew Morton  
> wrote:
>>
>> On Fri, 15 Feb 2019 18:51:51 + Mark Brown  wrote:
>>
>>> On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:
 On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
  wrote:
>>>
>   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
>   Plain log:  
> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
>   HTML log:   
> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
>>>
 Thanks.
>>>
 But what actually went wrong?  Kernel doesn't boot?
>>>
>>> The linked logs show the kernel dying early in boot before the console
>>> comes up so yeah.  There should be kernel output at the bottom of the
>>> logs.
>>
>> I assume Dan is distracted - I'll keep this patchset on hold until we
>> can get to the bottom of this.
> 
> Michal had asked if the free space accounting fix up addressed this
> boot regression? I was awaiting word on that.

I'm afraid it couldn't have. Bisection identified the "enable all
shuffling" patch, but the free area mis-accounting happened regardless
of shuffling being enabled. And if dropping the "enable all shuffling"
patch stopped the problem even before the misacounting fix was merged,
that's another confirmation.

Is it possible that the platform silently depends on large contiguous
areas without a proper CMA reservation, and the shuffling fragments
them? Or maybe the CMA reservation happens too late?

> I assume you're not willing to entertain a "depends
> NOT_THIS_ARM_BOARD" hack in the meantime?
> 



Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-03-01 Thread Guillaume Tucker
On 01/03/2019 00:55, Dan Williams wrote:
> On Thu, Feb 28, 2019 at 3:14 PM Andrew Morton  
> wrote:
>>
>> On Tue, 26 Feb 2019 16:04:04 -0800 Dan Williams  
>> wrote:
>>
>>> On Tue, Feb 26, 2019 at 4:00 PM Andrew Morton  
>>> wrote:

 On Fri, 15 Feb 2019 18:51:51 + Mark Brown  wrote:

> On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:
>> On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
>>  wrote:
>
>>>   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
>>>   Plain log:  
>>> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
>>>   HTML log:   
>>> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
>
>> Thanks.
>
>> But what actually went wrong?  Kernel doesn't boot?
>
> The linked logs show the kernel dying early in boot before the console
> comes up so yeah.  There should be kernel output at the bottom of the
> logs.

 I assume Dan is distracted - I'll keep this patchset on hold until we
 can get to the bottom of this.
>>>
>>> Michal had asked if the free space accounting fix up addressed this
>>> boot regression? I was awaiting word on that.
>>
>> hm, does b...@kernelci.org actually read emails?  Let's try info@ as well..

b...@kernelci.org is not person, it's a send-only account for
automated reports.  So no, it doesn't read emails.

I guess the tricky point here is that the authors of the commits
found by bisections may not always have the hardware needed to
reproduce the problem.  So it needs to be dealt with on a
case-by-case basis: sometimes they do have the hardware,
sometimes someone else on the list or on CC does, and sometimes
it's better for the people who have access to the test lab which
ran the KernelCI test to deal with it.

This case seems to fall into the last category.  As I have access
to the Collabora lab, I can do some quick checks to confirm
whether the proposed patch does fix the issue.  I hadn't realised
that someone was waiting for this to happen, especially as the
BeagleBone Black is a very common platform.  Sorry about that,
I'll take a look today.

It may be a nice feature to be able to give access to the
KernelCI test infrastructure to anyone who wants to debug an
issue reported by KernelCI or verify a fix, so they won't need to
have the hardware locally.  Something to think about for the
future.

>> Is it possible to determine whether this regression is still present in
>> current linux-next?

I'll try to re-apply the patch that caused the issue, then see if
the suggested change fixes it.  As far as the current linux-next
master branch is concerned, KernelCI boot tests are passing fine
on that platform.

Guillaume


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-02-28 Thread Dan Williams
On Thu, Feb 28, 2019 at 3:14 PM Andrew Morton  wrote:
>
> On Tue, 26 Feb 2019 16:04:04 -0800 Dan Williams  
> wrote:
>
> > On Tue, Feb 26, 2019 at 4:00 PM Andrew Morton  
> > wrote:
> > >
> > > On Fri, 15 Feb 2019 18:51:51 + Mark Brown  wrote:
> > >
> > > > On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:
> > > > > On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
> > > > >  wrote:
> > > >
> > > > > >   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
> > > > > >   Plain log:  
> > > > > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
> > > > > >   HTML log:   
> > > > > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
> > > >
> > > > > Thanks.
> > > >
> > > > > But what actually went wrong?  Kernel doesn't boot?
> > > >
> > > > The linked logs show the kernel dying early in boot before the console
> > > > comes up so yeah.  There should be kernel output at the bottom of the
> > > > logs.
> > >
> > > I assume Dan is distracted - I'll keep this patchset on hold until we
> > > can get to the bottom of this.
> >
> > Michal had asked if the free space accounting fix up addressed this
> > boot regression? I was awaiting word on that.
>
> hm, does b...@kernelci.org actually read emails?  Let's try info@ as well..

Thanks, yes. The logs don't give much to go on, so I can only iterate
on this as fast as I can drum up feedback.

>
> Is it possible to determine whether this regression is still present in
> current linux-next?
>
> > I assume you're not willing to entertain a "depends
> > NOT_THIS_ARM_BOARD" hack in the meantime?
>
> We'd probably never be able to remove it.  And we don't know whether
> other systems might be affected.

Right, and agree. I was just grasping at straws because I know of
users that want to take advantage of this and was lamenting the
upcoming apology tour saying, "sorry, maybe v5.2". I had always
expected that platforms outside of x86-servers would need to do their
own validation / evaluation before recommending this, and the
regression concern is why it defaulted to disabled... but boot
regressions are boot regressions.


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-02-28 Thread Andrew Morton
On Tue, 26 Feb 2019 16:04:04 -0800 Dan Williams  
wrote:

> On Tue, Feb 26, 2019 at 4:00 PM Andrew Morton  
> wrote:
> >
> > On Fri, 15 Feb 2019 18:51:51 + Mark Brown  wrote:
> >
> > > On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:
> > > > On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
> > > >  wrote:
> > >
> > > > >   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
> > > > >   Plain log:  
> > > > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
> > > > >   HTML log:   
> > > > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
> > >
> > > > Thanks.
> > >
> > > > But what actually went wrong?  Kernel doesn't boot?
> > >
> > > The linked logs show the kernel dying early in boot before the console
> > > comes up so yeah.  There should be kernel output at the bottom of the
> > > logs.
> >
> > I assume Dan is distracted - I'll keep this patchset on hold until we
> > can get to the bottom of this.
> 
> Michal had asked if the free space accounting fix up addressed this
> boot regression? I was awaiting word on that.

hm, does b...@kernelci.org actually read emails?  Let's try info@ as well..

Is it possible to determine whether this regression is still present in
current linux-next?

> I assume you're not willing to entertain a "depends
> NOT_THIS_ARM_BOARD" hack in the meantime?

We'd probably never be able to remove it.  And we don't know whether
other systems might be affected.


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-02-26 Thread Dan Williams
On Tue, Feb 26, 2019 at 4:00 PM Andrew Morton  wrote:
>
> On Fri, 15 Feb 2019 18:51:51 + Mark Brown  wrote:
>
> > On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:
> > > On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
> > >  wrote:
> >
> > > >   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
> > > >   Plain log:  
> > > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
> > > >   HTML log:   
> > > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
> >
> > > Thanks.
> >
> > > But what actually went wrong?  Kernel doesn't boot?
> >
> > The linked logs show the kernel dying early in boot before the console
> > comes up so yeah.  There should be kernel output at the bottom of the
> > logs.
>
> I assume Dan is distracted - I'll keep this patchset on hold until we
> can get to the bottom of this.

Michal had asked if the free space accounting fix up addressed this
boot regression? I was awaiting word on that.

I assume you're not willing to entertain a "depends
NOT_THIS_ARM_BOARD" hack in the meantime?


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-02-26 Thread Andrew Morton
On Fri, 15 Feb 2019 18:51:51 + Mark Brown  wrote:

> On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:
> > On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
> >  wrote:
> 
> > >   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
> > >   Plain log:  
> > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
> > >   HTML log:   
> > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
> 
> > Thanks.
> 
> > But what actually went wrong?  Kernel doesn't boot?
> 
> The linked logs show the kernel dying early in boot before the console
> comes up so yeah.  There should be kernel output at the bottom of the
> logs.

I assume Dan is distracted - I'll keep this patchset on hold until we
can get to the bottom of this.



Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-02-18 Thread Michal Hocko
On Fri 15-02-19 10:20:10, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master boot bisection: next-20190215 on beaglebone-black
> 
> Summary:
>   Start:  7a92eb7cc1dc Add linux-next specific files for 20190215
>   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
>   Plain log:  
> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
>   HTML log:   
> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
>   Result: 8dd037cc97d9 mm/shuffle: default enable all shuffling

Does
http://lkml.kernel.org/r/155033679702.1773410.13041474192173212653.st...@dwillia2-desk3.amr.corp.intel.com
make any difference?
-- 
Michal Hocko
SUSE Labs


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-02-15 Thread Stephen Rothwell
Hi Andrew,

On Fri, 15 Feb 2019 11:00:24 -0800 Andrew Morton  
wrote:
>
> On Fri, 15 Feb 2019 18:51:51 + Mark Brown  wrote:
> 
> > On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:  
> > > On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
> > >  wrote:  
> >   
> > > >   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
> > > >   Plain log:  
> > > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
> > > >   HTML log:   
> > > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
> > > >   
> >   
> > > Thanks.  
> >   
> > > But what actually went wrong?  Kernel doesn't boot?  
> > 
> > The linked logs show the kernel dying early in boot before the console
> > comes up so yeah.  There should be kernel output at the bottom of the
> > logs.  
> 
> OK, thanks.
> 
> Well, we have a result.  Stephen, can we please drop
> mm-shuffle-default-enable-all-shuffling.patch for now?

Dropped.

-- 
Cheers,
Stephen Rothwell


pgpHqYHxvuEsT.pgp
Description: OpenPGP digital signature


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-02-15 Thread Mark Brown
On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:
> On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
>  wrote:

> >   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
> >   Plain log:  
> > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
> >   HTML log:   
> > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html

> Thanks.

> But what actually went wrong?  Kernel doesn't boot?

The linked logs show the kernel dying early in boot before the console
comes up so yeah.  There should be kernel output at the bottom of the
logs.


signature.asc
Description: PGP signature


Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-02-15 Thread Andrew Morton
On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot"  
wrote:

> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master boot bisection: next-20190215 on beaglebone-black
> 
> Summary:
>   Start:  7a92eb7cc1dc Add linux-next specific files for 20190215
>   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
>   Plain log:  
> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
>   HTML log:   
> https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
>   Result: 8dd037cc97d9 mm/shuffle: default enable all shuffling
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: beaglebone-black
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   gcc-7
>   Config: multi_v7_defconfig+CONFIG_SMP=n
>   Test suite: boot
> 
> Breaking commit found:
> 
> ---
> commit 8dd037cc97d9226c97c2ee1abb4e97eff71e0c8d
> Author: Dan Williams 
> Date:   Fri Feb 15 11:28:30 2019 +1100
> 
> mm/shuffle: default enable all shuffling

Thanks.

But what actually went wrong?  Kernel doesn't boot?




Re: next/master boot bisection: next-20190215 on beaglebone-black

2019-02-15 Thread Andrew Morton
On Fri, 15 Feb 2019 18:51:51 + Mark Brown  wrote:

> On Fri, Feb 15, 2019 at 10:43:25AM -0800, Andrew Morton wrote:
> > On Fri, 15 Feb 2019 10:20:10 -0800 (PST) "kernelci.org bot" 
> >  wrote:
> 
> > >   Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
> > >   Plain log:  
> > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
> > >   HTML log:   
> > > https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
> 
> > Thanks.
> 
> > But what actually went wrong?  Kernel doesn't boot?
> 
> The linked logs show the kernel dying early in boot before the console
> comes up so yeah.  There should be kernel output at the bottom of the
> logs.

OK, thanks.

Well, we have a result.  Stephen, can we please drop
mm-shuffle-default-enable-all-shuffling.patch for now?


next/master boot bisection: next-20190215 on beaglebone-black

2019-02-15 Thread kernelci.org bot
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has  *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.  *
* Hope this helps!  *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

next/master boot bisection: next-20190215 on beaglebone-black

Summary:
  Start:  7a92eb7cc1dc Add linux-next specific files for 20190215
  Details:https://kernelci.org/boot/id/5c666ea959b514b017fe6017
  Plain log:  
https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.txt
  HTML log:   
https://storage.kernelci.org//next/master/next-20190215/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-7/lab-collabora/boot-am335x-boneblack.html
  Result: 8dd037cc97d9 mm/shuffle: default enable all shuffling

Checks:
  revert: PASS
  verify: PASS

Parameters:
  Tree:   next
  URL:git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  Branch: master
  Target: beaglebone-black
  CPU arch:   arm
  Lab:lab-collabora
  Compiler:   gcc-7
  Config: multi_v7_defconfig+CONFIG_SMP=n
  Test suite: boot

Breaking commit found:

---
commit 8dd037cc97d9226c97c2ee1abb4e97eff71e0c8d
Author: Dan Williams 
Date:   Fri Feb 15 11:28:30 2019 +1100

mm/shuffle: default enable all shuffling

Per Andrew's request arrange for all memory allocation shuffling code to
be enabled by default.

The page_alloc.shuffle command line parameter can still be used to disable
shuffling at boot, but the kernel will default enable the shuffling if the
command line option is not specified.

Link: 
http://lkml.kernel.org/r/154943713572.3858443.11206307988382889377.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams 
Cc: Kees Cook 
Cc: Michal Hocko 
Cc: Dave Hansen 
Cc: Keith Busch 

Signed-off-by: Andrew Morton 
Signed-off-by: Stephen Rothwell 

diff --git a/init/Kconfig b/init/Kconfig
index 4531a97092c7..9d4b05e79a2d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1709,7 +1709,7 @@ config SLAB_MERGE_DEFAULT
  command line.
 
 config SLAB_FREELIST_RANDOM
-   default n
+   default y
depends on SLAB || SLUB
bool "SLAB freelist randomization"
help
@@ -1728,7 +1728,7 @@ config SLAB_FREELIST_HARDENED
 
 config SHUFFLE_PAGE_ALLOCATOR
bool "Page allocator randomization"
-   default SLAB_FREELIST_RANDOM && ACPI_NUMA
+   default y
help
  Randomization of the page allocator improves the average
  utilization of a direct-mapped memory-side-cache. See section
diff --git a/mm/shuffle.c b/mm/shuffle.c
index 3ce12481b1dc..a979b48be469 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -9,8 +9,8 @@
 #include "internal.h"
 #include "shuffle.h"
 
-DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
-static unsigned long shuffle_state __ro_after_init;
+DEFINE_STATIC_KEY_TRUE(page_alloc_shuffle_key);
+static unsigned long shuffle_state __ro_after_init = 1 << SHUFFLE_ENABLE;
 
 /*
  * Depending on the architecture, module parameter parsing may run
diff --git a/mm/shuffle.h b/mm/shuffle.h
index 777a257a0d2f..c1e91ec118be 100644
--- a/mm/shuffle.h
+++ b/mm/shuffle.h
@@ -19,7 +19,7 @@ enum mm_shuffle_ctl {
 #define SHUFFLE_ORDER (MAX_ORDER-1)
 
 #ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR
-DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key);
+DECLARE_STATIC_KEY_TRUE(page_alloc_shuffle_key);
 extern void page_alloc_shuffle(enum mm_shuffle_ctl ctl);
 extern void __shuffle_free_memory(pg_data_t *pgdat);
 static inline void shuffle_free_memory(pg_data_t *pgdat)
---


Git bisection log:

---
git bisect start
# good: [23e93c9b2cde73f9912d0d8534adbddd3dcc48f4] Revert "gfs2: read journal 
in large chunks to locate the head"
git bisect good 23e93c9b2cde73f9912d0d8534adbddd3dcc48f4
# bad: [7a92eb7cc1dc4c63e3a2fa9ab8e3c1049f199249] Add linux-next specific files 
for 20190215
git bisect bad 7a92eb7cc1dc4c63e3a2fa9ab8e3c1049f199249
# good: [3811b833d598702c05fd25e36a60f134dd5413b3] Merge remote-tracking branch 
'crypto/master'
git bisect good 3811b833d598702c05fd25e36a60f134dd5413b3
# good: [c6cd1b643783f81eaa8e0d777ab0f887df905a45] Merge remote-tracking branch 
'spi/for-next'
git bisect good c6cd1b643783f81eaa8e0d777ab0f887df905a45
# good: [36514d08b01218e91810d4007820e0f7d69851fa] Merge remote-tracking branc