Re: [PATCH v2] HID: usbhid: do not sleep when opening device

2020-08-18 Thread Johannes Hirte
On 2020 Jun 09, Dmitry Torokhov wrote:
> usbhid tries to give the device 50 milliseconds to drain its queues when
> opening the device, but dies it naively by simply sleeping in open handler,
> which slows down device probing (and thus may affect overall boot time).
> 
> However we do not need to sleep as we can instead mark a point of time in
> the future when we should start processing the events.
> 
> Reported-by: Nicolas Boichat 
> Signed-off-by: Dmitry Torokhov 
> ---
> 

This change breaks various Logitech devices: 
https://bugzilla.kernel.org/show_bug.cgi?id=208935

-- 
Regards,
  Johannes Hirte



Re: [PATCH] iwlwifi: Don't IWL_WARN on FW reconfiguration

2020-07-13 Thread Johannes Hirte
On 2020 Jul 13, Chris Down wrote:
> Just to check in again since this is still happening: is this expected?
> 
> I expect that if this is IWL_WARN, it should indicate some unexpected or 
> non-ideal state, but the card seems to operate just fine afterwards.

I'm confused too, cause I'm seeing this on an AC 8260 now, whereas in the
past there wasn't such a message. Is this something the user should be
aware of? If not, I'm with Chris that this should be silenced. 

-- 
Regards,
  Johannes Hirte



Re: [PATCH v2 1/5] perf/x86/rapl: move RAPL support to common x86 code

2020-06-04 Thread Johannes Hirte
On 2020 Jun 01, Stephane Eranian wrote:
> On Mon, Jun 1, 2020 at 5:39 AM Johannes Hirte
>  wrote:
> >
> > On 2020 Mai 27, Stephane Eranian wrote:
> >
> > ...
> > > diff --git a/arch/x86/events/Makefile b/arch/x86/events/Makefile
> > > index 6f1d1fde8b2de..12c42eba77ec3 100644
> > > --- a/arch/x86/events/Makefile
> > > +++ b/arch/x86/events/Makefile
> > > @@ -1,5 +1,6 @@
> > >  # SPDX-License-Identifier: GPL-2.0-only
> > >  obj-y+= core.o probe.o
> > > +obj-$(PERF_EVENTS_INTEL_RAPL)+= rapl.o
> > >  obj-y+= amd/
> > >  obj-$(CONFIG_X86_LOCAL_APIC)+= msr.o
> > >  obj-$(CONFIG_CPU_SUP_INTEL)  += intel/
> >
> > With this change, rapl won't be build. Must be:
> >
> > obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL)+= rapl.o
> >
> Correct. I posted a patch last week to fix that.
> Thanks.

Yes, it just wasn't in tip when I've tested. Sorry for the noise.

-- 
Regards,
  Johannes Hirte



Re: [PATCH v2 1/5] perf/x86/rapl: move RAPL support to common x86 code

2020-06-01 Thread Johannes Hirte
On 2020 Mai 27, Stephane Eranian wrote:

...
> diff --git a/arch/x86/events/Makefile b/arch/x86/events/Makefile
> index 6f1d1fde8b2de..12c42eba77ec3 100644
> --- a/arch/x86/events/Makefile
> +++ b/arch/x86/events/Makefile
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  obj-y+= core.o probe.o
> +obj-$(PERF_EVENTS_INTEL_RAPL)+= rapl.o
>  obj-y+= amd/
>  obj-$(CONFIG_X86_LOCAL_APIC)+= msr.o
>  obj-$(CONFIG_CPU_SUP_INTEL)  += intel/

With this change, rapl won't be build. Must be:

obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL)    += rapl.o

-- 
Regards,
  Johannes Hirte



Re: [PATCH] x86/build: Move _etext to actual end of .text

2019-06-09 Thread Johannes Hirte
On 2019 Jun 09, Klaus Kusche wrote:
> 
> Hello,
> 
> Same problem for linux 5.1.7: 
> Kernel building fails with the same relocation error.
> 
> 5.1.5 does not have the problem, builds fine for me.
> 
> Is there anything I can do to investigate the problem?
> 

Please try linux 5.1.8. The problematic patch was reverted there.

-- 
Regards,
  Johannes



Re: [PATCH] x86/build: Move _etext to actual end of .text

2019-05-16 Thread Johannes Hirte
On 2019 Mai 15, Kees Cook wrote:
> On Tue, May 14, 2019 at 06:10:55PM +0200, Johannes Hirte wrote:
> > On 2019 Mai 14, Kees Cook wrote:
> > > On Tue, May 14, 2019 at 02:04:21PM +0200, Johannes Hirte wrote:
> > > > This breaks the build on my system:
> > > > 
> > > >   RELOCS  arch/x86/boot/compressed/vmlinux.relocs
> > > >   CC  arch/x86/boot/compressed/early_serial_console.o
> > > >   CC  arch/x86/boot/compressed/kaslr.o
> > > >   AS  arch/x86/boot/compressed/mem_encrypt.o
> > > >   CC  arch/x86/boot/compressed/kaslr_64.o
> > > > Invalid absolute R_X86_64_32S relocation: _etext
> > > > make[2]: *** [arch/x86/boot/compressed/Makefile:130: 
> > > > arch/x86/boot/compressed/vmlinux.relocs] Error 1
> > > > make[2]: *** Deleting file 'arch/x86/boot/compressed/vmlinux.relocs'
> > > > make[2]: *** Waiting for unfinished jobs
> > > > make[1]: *** [arch/x86/boot/Makefile:112: 
> > > > arch/x86/boot/compressed/vmlinux] Error 2
> > > > make: *** [arch/x86/Makefile:283: bzImage] Error 2
> > > 
> > > Interesting! Can you send along your .config and compiler details?
> > 
> > Tested with gcc-8.3 and gcc-9.1, both the same result.
> > [...]
> > gcc version 8.3.0 (Gentoo 8.3.0-r1 p1.1)
> 
> Hm, I'm not able to reproduce this with any of the compilers I have
> access to. The most recent I have is:
> 
> gcc (Ubuntu 20180425-1ubuntu1) 9.0.0 20180425 (experimental) [trunk revision 
> 259645]
> 
> Various stupid questions: did you wipe the whole bulid tree and start
> clean? 

No I didn't. And this fixed it now. After a distclean I'm unable to
reproduce it. So sorry for the noise.

-- 
Regards,
  Johannes



Re: [PATCH] x86/build: Move _etext to actual end of .text

2019-05-14 Thread Johannes Hirte
On 2019 Mai 14, Kees Cook wrote:
> On Tue, May 14, 2019 at 02:04:21PM +0200, Johannes Hirte wrote:
> > On 2019 Apr 23, Kees Cook wrote:
> > > When building x86 with Clang LTO and CFI, CFI jump regions are
> > > automatically added to the end of the .text section late in linking. As a
> > > result, the _etext position was being labelled before the appended jump
> > > regions, causing confusion about where the boundaries of the executable
> > > region actually are in the running kernel, and broke at least the fault
> > > injection code. This moves the _etext mark to outside (and immediately
> > > after) the .text area, as it already the case on other architectures
> > > (e.g. arm64, arm).
> > > 
> > > Reported-and-tested-by: Sami Tolvanen 
> > > Signed-off-by: Kees Cook 
> > > ---
> > >  arch/x86/kernel/vmlinux.lds.S | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > > index bad8c51fee6e..de94da2366e7 100644
> > > --- a/arch/x86/kernel/vmlinux.lds.S
> > > +++ b/arch/x86/kernel/vmlinux.lds.S
> > > @@ -141,11 +141,11 @@ SECTIONS
> > >   *(.text.__x86.indirect_thunk)
> > >   __indirect_thunk_end = .;
> > >  #endif
> > > -
> > > - /* End of text section */
> > > - _etext = .;
> > >   } :text = 0x9090
> > >  
> > > + /* End of text section */
> > > + _etext = .;
> > > +
> > >   NOTES :text :note
> > >  
> > >   EXCEPTION_TABLE(16) :text = 0x9090
> > > -- 
> > > 2.17.1
> > 
> > This breaks the build on my system:
> > 
> >   RELOCS  arch/x86/boot/compressed/vmlinux.relocs
> >   CC  arch/x86/boot/compressed/early_serial_console.o
> >   CC  arch/x86/boot/compressed/kaslr.o
> >   AS  arch/x86/boot/compressed/mem_encrypt.o
> >   CC  arch/x86/boot/compressed/kaslr_64.o
> > Invalid absolute R_X86_64_32S relocation: _etext
> > make[2]: *** [arch/x86/boot/compressed/Makefile:130: 
> > arch/x86/boot/compressed/vmlinux.relocs] Error 1
> > make[2]: *** Deleting file 'arch/x86/boot/compressed/vmlinux.relocs'
> > make[2]: *** Waiting for unfinished jobs
> > make[1]: *** [arch/x86/boot/Makefile:112: arch/x86/boot/compressed/vmlinux] 
> > Error 2
> > make: *** [arch/x86/Makefile:283: bzImage] Error 2
> 
> Interesting! Can you send along your .config and compiler details?

Tested with gcc-8.3 and gcc-9.1, both the same result.

Using built-in specs.
COLLECT_GCC=gcc-8.3.0
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/8.3.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: 
/var/tmp/portage/sys-devel/gcc-8.3.0-r1/work/gcc-8.3.0/configure 
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr 
--bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/8.3.0 
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include 
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/8.3.0 
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/8.3.0/man 
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/8.3.0/info 
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/g++-v8 
--with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/8.3.0/python 
--enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt 
--disable-werror --with-system-zlib --enable-nls --without-included-gettext 
--enable-checking=release --with-bugurl=https://bugs.gentoo.org/ 
--with-pkgversion='Gentoo 8.3.0-r1 p1.1' --disable-esp --enable-libstdcxx-time 
--enable-shared --enable-threads=posix --enable-__cxa_atexit 
--enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 
--disable-altivec --disable-fixed-point --enable-targets=all --enable-libgomp 
--disable-libmudflap --disable-libssp --disable-libmpx --disable-systemtap 
--enable-vtable-verify --enable-lto --without-isl --enable-default-pie 
--enable-default-ssp
Thread model: posix
gcc version 8.3.0 (Gentoo 8.3.0-r1 p1.1)

Using built-in specs.
COLLECT_GCC=gcc-9.1.0
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/9.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-9.1.0/work/gcc-9.1.0/configure 
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr 
--bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/9.1.0 
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/include 
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/9.1.0 
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/9.1.0/man 
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/9.1.0/info 
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/include/g++-v9 
--wi

Re: [PATCH] x86/build: Move _etext to actual end of .text

2019-05-14 Thread Johannes Hirte
On 2019 Apr 23, Kees Cook wrote:
> When building x86 with Clang LTO and CFI, CFI jump regions are
> automatically added to the end of the .text section late in linking. As a
> result, the _etext position was being labelled before the appended jump
> regions, causing confusion about where the boundaries of the executable
> region actually are in the running kernel, and broke at least the fault
> injection code. This moves the _etext mark to outside (and immediately
> after) the .text area, as it already the case on other architectures
> (e.g. arm64, arm).
> 
> Reported-and-tested-by: Sami Tolvanen 
> Signed-off-by: Kees Cook 
> ---
>  arch/x86/kernel/vmlinux.lds.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index bad8c51fee6e..de94da2366e7 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -141,11 +141,11 @@ SECTIONS
>   *(.text.__x86.indirect_thunk)
>   __indirect_thunk_end = .;
>  #endif
> -
> - /* End of text section */
> - _etext = .;
>   } :text = 0x9090
>  
> + /* End of text section */
> + _etext = .;
> +
>   NOTES :text :note
>  
>   EXCEPTION_TABLE(16) :text = 0x9090
> -- 
> 2.17.1

This breaks the build on my system:

  RELOCS  arch/x86/boot/compressed/vmlinux.relocs
  CC  arch/x86/boot/compressed/early_serial_console.o
  CC  arch/x86/boot/compressed/kaslr.o
  AS  arch/x86/boot/compressed/mem_encrypt.o
  CC  arch/x86/boot/compressed/kaslr_64.o
Invalid absolute R_X86_64_32S relocation: _etext
make[2]: *** [arch/x86/boot/compressed/Makefile:130: 
arch/x86/boot/compressed/vmlinux.relocs] Error 1
make[2]: *** Deleting file 'arch/x86/boot/compressed/vmlinux.relocs'
make[2]: *** Waiting for unfinished jobs
make[1]: *** [arch/x86/boot/Makefile:112: arch/x86/boot/compressed/vmlinux] 
Error 2
make: *** [arch/x86/Makefile:283: bzImage] Error 2



-- 
Regards,
  Johannes



Re: Suggestion: „spectre_v2=off“ and „nopti“ per default in "Intel Atom N270" case?

2018-05-23 Thread Johannes Hirte
On 2018 Mai 23, Pavel Machek wrote:
> On Sat 2018-05-19 21:53:02, Christian Krüger wrote:
> > Hi,
> > 
> > Since the old "in-order-execution" Intel CPUs like the Intel Atom N270
> > (known for being installed in many Netbooks and Nettops) are not sensitive
> > for "Meltdown" & "Spectre" , wouldn't it be a good idea to exclude these
> > anyway "weak" CPUs from the costly patches by default?
> > 
> > Browsing the web, I can "feel the difference" if the matching kernel options
> > are applied on such a device.
> 
> Can you also measure the difference? Placebo effect is hard to avoid.
> 
> But yes, we do not need to do workarounds on non-buggy machines...
> 
>   Pavel

On my Atom N270 there doesn't seem to be any workaround active with
kernel 4.14.42:

localhost ~ # cat /sys/devices/system/cpu/vulnerabilities/*
Not affected
Not affected
Not affected

Christian, did you verified the mitigations are active on your system?
What kernels are affected? 

-- 
Regards,
  Johannes



Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-17 Thread Johannes Hirte
On 2018 Mai 17, Borislav Petkov wrote:
> On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote:
> > Maybe I'm missing something, but those RDMSR IPSs don't happen on
> > pre-SMCA systems, right? So the caching should be avoided here, cause
> > the whole lookup looks more expensive to me than the simple switch-block
> > in get_block_address.
> 
> Yeah, and we should simply cache all those MSR values as I suggested then.
> 
> The patch at the end should fix your issue.
> 

Works as expected on my Carrizo.

-- 
Regards,
  Johannes



Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-16 Thread Johannes Hirte
On 2018 Mai 17, Borislav Petkov wrote:
> On Tue, May 15, 2018 at 11:39:54AM +0200, Johannes Hirte wrote:
> > The out-of-bound access happens in get_block_address:
> > 
> > if (bankp && bankp->blocks) {
> > struct threshold_block *blockp blockp = &bankp->blocks[block];
> > 
> > with block=1. This doesn't exists. I don't even find any array here.
> > There is a linked list, created in allocate_threshold_blocks. On my
> > system I get 17 lists with one element each.
> 
> Yes, what a mess this is. ;-\
> 
> There's no such thing as ->blocks[block] array. We assign simply the
> threshold_block to it in allocate_threshold_blocks:
> 
>   per_cpu(threshold_banks, cpu)[bank]->blocks = b;
> 
> And I can't say the design of this thing is really friendly but it is
> still no excuse that I missed that during review. Grrr.
> 
> So, Yazen, what really needs to happen here is to iterate the
> bank->blocks->miscj list to find the block you're looking for and return
> its address, the opposite to this here:
> 
> if (per_cpu(threshold_banks, cpu)[bank]->blocks) {
> list_add(&b->miscj,
>  &per_cpu(threshold_banks, cpu)[bank]->blocks->miscj);
> } else {
> per_cpu(threshold_banks, cpu)[bank]->blocks = b;
> }
> 
> and don't forget to look at ->blocks itself.
> 
> And then you need to make sure that searching for block addresses still
> works when resuming from suspend so that you can avoid the RDMSR IPIs.
> 

Maybe I'm missing something, but those RDMSR IPSs don't happen on
pre-SMCA systems, right? So the caching should be avoided here, cause
the whole lookup looks more expensive to me than the simple switch-block
in get_block_address.

-- 
Regards,
  Johannes



Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-15 Thread Johannes Hirte
On 2018 Apr 17, Ghannam, Yazen wrote:
> > -Original Message-
> > From: linux-edac-ow...@vger.kernel.org  > ow...@vger.kernel.org> On Behalf Of Johannes Hirte
> > Sent: Monday, April 16, 2018 7:56 AM
> > To: Ghannam, Yazen 
> > Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; b...@suse.de;
> > tony.l...@intel.com; x...@kernel.org
> > Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized
> > block
> > 
> > On 2018 Apr 14, Johannes Hirte wrote:
> > > On 2018 Feb 01, Yazen Ghannam wrote:
> > > > From: Yazen Ghannam 
> > > >
> > > > The block address is saved after the block is initialized when
> > > > threshold_init_device() is called.
> > > >
> > > > Use the saved block address, if available, rather than trying to
> > > > rediscover it.
> > > >
> > > > We can avoid some *on_cpu() calls in the init path that will cause a
> > > > call trace when resuming from suspend.
> > > >
> > > > Cc:  # 4.14.x
> > > > Signed-off-by: Yazen Ghannam 
> > > > ---
> > > >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++
> > > >  1 file changed, 15 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > > > index bf53b4549a17..8c4f8f30c779 100644
> > > > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > > > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > > > @@ -436,6 +436,21 @@ static u32 get_block_address(unsigned int cpu,
> > u32 current_addr, u32 low, u32 hi
> > > >  {
> > > > u32 addr = 0, offset = 0;
> > > >
> > > > +   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
> > > > +   return addr;
> > > > +
> > > > +   /* Get address from already initialized block. */
> > > > +   if (per_cpu(threshold_banks, cpu)) {
> > > > +   struct threshold_bank *bankp = per_cpu(threshold_banks,
> > cpu)[bank];
> > > > +
> > > > +   if (bankp && bankp->blocks) {
> > > > +   struct threshold_block *blockp = &bankp-
> > >blocks[block];
> > > > +
> > > > +   if (blockp)
> > > > +   return blockp->address;
> > > > +   }
> > > > +   }
> > > > +
> > > > if (mce_flags.smca) {
> > > > if (smca_get_bank_type(bank) == SMCA_RESERVED)
> > > > return addr;
> > > > --
> > > > 2.14.1
> > >
> > > I have a KASAN: slab-out-of-bounds, and git bisect points me to this
> > > change:
> > >
> > > Apr 13 00:40:32 probook kernel:
> > 
> > ==
> > > Apr 13 00:40:32 probook kernel: BUG: KASAN: slab-out-of-bounds in
> > get_block_address.isra.3+0x1e9/0x520
> > > Apr 13 00:40:32 probook kernel: Read of size 4 at addr 8803f165ddf4 by
> > task swapper/0/1
> > > Apr 13 00:40:32 probook kernel:
> > > Apr 13 00:40:32 probook kernel: CPU: 1 PID: 1 Comm: swapper/0 Not
> > tainted 4.16.0-10757-g4ca8ba4ccff9 #532
> > > Apr 13 00:40:32 probook kernel: Hardware name: HP HP ProBook 645
> > G2/80FE, BIOS N77 Ver. 01.12 12/19/2017
> > > Apr 13 00:40:32 probook kernel: Call Trace:
> > > Apr 13 00:40:32 probook kernel:  dump_stack+0x5b/0x8b
> > > Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> > > Apr 13 00:40:32 probook kernel:  print_address_description+0x65/0x270
> > > Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> > > Apr 13 00:40:32 probook kernel:  kasan_report+0x232/0x350
> > > Apr 13 00:40:32 probook kernel:  get_block_address.isra.3+0x1e9/0x520
> > > Apr 13 00:40:32 probook kernel:  ? kobject_init_and_add+0xde/0x130
> > > Apr 13 00:40:32 probook kernel:  ? get_name+0x390/0x390
> > > Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
> > > Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
> > > Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x12c/0xc60
> > > Apr 13 00:40:32 probook kernel:  ? kobject_add_internal+0x800/0x800
> > > Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x520/0x520
> > > 

Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-04-16 Thread Johannes Hirte
On 2018 Apr 14, Johannes Hirte wrote:
> On 2018 Feb 01, Yazen Ghannam wrote:
> > From: Yazen Ghannam 
> > 
> > The block address is saved after the block is initialized when
> > threshold_init_device() is called.
> > 
> > Use the saved block address, if available, rather than trying to
> > rediscover it.
> > 
> > We can avoid some *on_cpu() calls in the init path that will cause a
> > call trace when resuming from suspend.
> > 
> > Cc:  # 4.14.x
> > Signed-off-by: Yazen Ghannam 
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
> > b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > index bf53b4549a17..8c4f8f30c779 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > @@ -436,6 +436,21 @@ static u32 get_block_address(unsigned int cpu, u32 
> > current_addr, u32 low, u32 hi
> >  {
> > u32 addr = 0, offset = 0;
> >  
> > +   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
> > +   return addr;
> > +
> > +   /* Get address from already initialized block. */
> > +   if (per_cpu(threshold_banks, cpu)) {
> > +   struct threshold_bank *bankp = per_cpu(threshold_banks, 
> > cpu)[bank];
> > +
> > +   if (bankp && bankp->blocks) {
> > +   struct threshold_block *blockp = &bankp->blocks[block];
> > +
> > +   if (blockp)
> > +   return blockp->address;
> > +   }
> > +   }
> > +
> > if (mce_flags.smca) {
> > if (smca_get_bank_type(bank) == SMCA_RESERVED)
> > return addr;
> > -- 
> > 2.14.1
> 
> I have a KASAN: slab-out-of-bounds, and git bisect points me to this
> change:
> 
> Apr 13 00:40:32 probook kernel: 
> ==
> Apr 13 00:40:32 probook kernel: BUG: KASAN: slab-out-of-bounds in 
> get_block_address.isra.3+0x1e9/0x520
> Apr 13 00:40:32 probook kernel: Read of size 4 at addr 8803f165ddf4 by 
> task swapper/0/1
> Apr 13 00:40:32 probook kernel: 
> Apr 13 00:40:32 probook kernel: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
> 4.16.0-10757-g4ca8ba4ccff9 #532
> Apr 13 00:40:32 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, 
> BIOS N77 Ver. 01.12 12/19/2017
> Apr 13 00:40:32 probook kernel: Call Trace:
> Apr 13 00:40:32 probook kernel:  dump_stack+0x5b/0x8b
> Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> Apr 13 00:40:32 probook kernel:  print_address_description+0x65/0x270
> Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> Apr 13 00:40:32 probook kernel:  kasan_report+0x232/0x350
> Apr 13 00:40:32 probook kernel:  get_block_address.isra.3+0x1e9/0x520
> Apr 13 00:40:32 probook kernel:  ? kobject_init_and_add+0xde/0x130
> Apr 13 00:40:32 probook kernel:  ? get_name+0x390/0x390
> Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
> Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
> Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x12c/0xc60
> Apr 13 00:40:32 probook kernel:  ? kobject_add_internal+0x800/0x800
> Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x520/0x520
> Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
> Apr 13 00:40:32 probook kernel:  mce_threshold_create_device+0x35b/0x990
> Apr 13 00:40:32 probook kernel:  ? init_special_inode+0x1d0/0x230
> Apr 13 00:40:32 probook kernel:  threshold_init_device+0x98/0xa7
> Apr 13 00:40:32 probook kernel:  ? mcheck_vendor_init_severity+0x43/0x43
> Apr 13 00:40:32 probook kernel:  do_one_initcall+0x76/0x30c
> Apr 13 00:40:32 probook kernel:  ? 
> trace_event_raw_event_initcall_finish+0x190/0x190
> Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0xb/0x40
> Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
> Apr 13 00:40:32 probook kernel:  kernel_init_freeable+0x3d6/0x471
> Apr 13 00:40:32 probook kernel:  ? rest_init+0xf0/0xf0
> Apr 13 00:40:32 probook kernel:  kernel_init+0xa/0x120
> Apr 13 00:40:32 probook kernel:  ? rest_init+0xf0/0xf0
> Apr 13 00:40:32 probook kernel:  ret_from_fork+0x22/0x40
> Apr 13 00:40:32 probook kernel:
> Apr 13 00:40:32 probook kernel: Allocated by task 1:
> Apr 13 00:40:32 probook kernel:  kasan_kmalloc+0xa0/0xd0
> Apr 13 00:40:32 probook kernel:  kmem_cache_alloc_trace+0xf3/0x1f0
> Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x1bc/0xc60
> Apr 13 00:40:32 probook kerne

Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-04-13 Thread Johannes Hirte
On 2018 Feb 01, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> The block address is saved after the block is initialized when
> threshold_init_device() is called.
> 
> Use the saved block address, if available, rather than trying to
> rediscover it.
> 
> We can avoid some *on_cpu() calls in the init path that will cause a
> call trace when resuming from suspend.
> 
> Cc:  # 4.14.x
> Signed-off-by: Yazen Ghannam 
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index bf53b4549a17..8c4f8f30c779 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -436,6 +436,21 @@ static u32 get_block_address(unsigned int cpu, u32 
> current_addr, u32 low, u32 hi
>  {
>   u32 addr = 0, offset = 0;
>  
> + if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
> + return addr;
> +
> + /* Get address from already initialized block. */
> + if (per_cpu(threshold_banks, cpu)) {
> + struct threshold_bank *bankp = per_cpu(threshold_banks, 
> cpu)[bank];
> +
> + if (bankp && bankp->blocks) {
> + struct threshold_block *blockp = &bankp->blocks[block];
> +
> + if (blockp)
> + return blockp->address;
> + }
> + }
> +
>   if (mce_flags.smca) {
>   if (smca_get_bank_type(bank) == SMCA_RESERVED)
>   return addr;
> -- 
> 2.14.1

I have a KASAN: slab-out-of-bounds, and git bisect points me to this
change:

Apr 13 00:40:32 probook kernel: 
==
Apr 13 00:40:32 probook kernel: BUG: KASAN: slab-out-of-bounds in 
get_block_address.isra.3+0x1e9/0x520
Apr 13 00:40:32 probook kernel: Read of size 4 at addr 8803f165ddf4 by task 
swapper/0/1
Apr 13 00:40:32 probook kernel: 
Apr 13 00:40:32 probook kernel: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.16.0-10757-g4ca8ba4ccff9 #532
Apr 13 00:40:32 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, BIOS 
N77 Ver. 01.12 12/19/2017
Apr 13 00:40:32 probook kernel: Call Trace:
Apr 13 00:40:32 probook kernel:  dump_stack+0x5b/0x8b
Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
Apr 13 00:40:32 probook kernel:  print_address_description+0x65/0x270
Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
Apr 13 00:40:32 probook kernel:  kasan_report+0x232/0x350
Apr 13 00:40:32 probook kernel:  get_block_address.isra.3+0x1e9/0x520
Apr 13 00:40:32 probook kernel:  ? kobject_init_and_add+0xde/0x130
Apr 13 00:40:32 probook kernel:  ? get_name+0x390/0x390
Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x12c/0xc60
Apr 13 00:40:32 probook kernel:  ? kobject_add_internal+0x800/0x800
Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x520/0x520
Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
Apr 13 00:40:32 probook kernel:  mce_threshold_create_device+0x35b/0x990
Apr 13 00:40:32 probook kernel:  ? init_special_inode+0x1d0/0x230
Apr 13 00:40:32 probook kernel:  threshold_init_device+0x98/0xa7
Apr 13 00:40:32 probook kernel:  ? mcheck_vendor_init_severity+0x43/0x43
Apr 13 00:40:32 probook kernel:  do_one_initcall+0x76/0x30c
Apr 13 00:40:32 probook kernel:  ? 
trace_event_raw_event_initcall_finish+0x190/0x190
Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0xb/0x40
Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
Apr 13 00:40:32 probook kernel:  kernel_init_freeable+0x3d6/0x471
Apr 13 00:40:32 probook kernel:  ? rest_init+0xf0/0xf0
Apr 13 00:40:32 probook kernel:  kernel_init+0xa/0x120
Apr 13 00:40:32 probook kernel:  ? rest_init+0xf0/0xf0
Apr 13 00:40:32 probook kernel:  ret_from_fork+0x22/0x40
Apr 13 00:40:32 probook kernel:
Apr 13 00:40:32 probook kernel: Allocated by task 1:
Apr 13 00:40:32 probook kernel:  kasan_kmalloc+0xa0/0xd0
Apr 13 00:40:32 probook kernel:  kmem_cache_alloc_trace+0xf3/0x1f0
Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x1bc/0xc60
Apr 13 00:40:32 probook kernel:  mce_threshold_create_device+0x35b/0x990
Apr 13 00:40:32 probook kernel:  threshold_init_device+0x98/0xa7
Apr 13 00:40:32 probook kernel:  do_one_initcall+0x76/0x30c
Apr 13 00:40:32 probook kernel:  kernel_init_freeable+0x3d6/0x471
Apr 13 00:40:32 probook kernel:  kernel_init+0xa/0x120
Apr 13 00:40:32 probook kernel:  ret_from_fork+0x22/0x40
Apr 13 00:40:32 probook kernel: 
Apr 13 00:40:32 probook kernel: Freed by task 0:
Apr 13 00:40:32 probook kernel: (stack is not available)
Apr 13 00:40:32 probook kernel: 
Apr 13 00:40:32 probook kernel: The buggy address belongs to the object at 
8803f165dd80
 which belongs to the cache kmalloc-128 of size 128
Apr 13 00:40:3

Re: random insta-reboots on AMD Phenom II

2017-10-06 Thread Johannes Hirte
On 2017 Sep 30, Borislav Petkov wrote:
> On Sat, Sep 30, 2017 at 02:47:11PM +0200, Markus Trippelsdorf wrote:
> > Changing the TLB code so late might not be a good idea...
> 
> The new lazy code is too risky to keep as we don't know what else will
> break. The conservative and thus safe thing to do is to revert to the
> old behavior for old machines.
>

I see the same behaviour on Carizzo. Is Excavator an old machine too?

--
Regards,
  Johannes


Re: random insta-reboots on AMD Phenom II

2017-10-06 Thread Johannes Hirte
On 2017 Okt 06, Borislav Petkov wrote:
> On Fri, Oct 06, 2017 at 08:49:33PM +0200, Johannes Hirte wrote:
> > I see the same behaviour on Carizzo. Is Excavator an old machine too?
> 
> Do
> 
> # rdmsr -a 0xc0010015
> 
> as root and paste it here.
> 
> Thx.

19001011
19001011
19001011
19001011

--
Regards,
  Johannes


"drm/core: Do not preserve framebuffer on rmfb prevends display" breaks display resume after dpms suspend

2016-06-28 Thread Johannes Hirte
On my system the display doesn't come back after a dpms suspend with
X11. The display is powered off after the configured time, but to get it
powered on I need to switch to console (and back to X). It doesn't
happen on simple keyboard or mouse input. Also the brightness is on the
lowest level, independently what adjustment it was before suspend.
Bisecting pointed me to this commit:

f2d580b9a8149735cbc4b59c4a8df60173658140
drm/core: Do not preserve framebuffer on rmfb, v4.

and reverting it solves the problem for me.

System is ProBook 6450b from HP with integrated Intel graphics
(Ironlake). The  xf86-video-intel driver is configured with UXA
acceleration.


regards,
  Johannes



Re: [PATCH -next] slub: Replace __this_cpu_inc usage w/ SLUB_STATS

2014-04-14 Thread Johannes Hirte
On Thu, 6 Mar 2014 12:29:41 -0600
Josh Cartwright  wrote:

> On Thu, Mar 06, 2014 at 09:53:16AM -0600, Josh Cartwright wrote:
> > Booting on my Samsung Series 9 laptop gives me loads and loads of
> > BUGs triggered by __this_cpu_add(), making making the system
> > completely unusable:
> > 
> > [5.808326] BUG: using __this_cpu_add() in preemptible
> > [] code: swapper/0/1 [5.812331] caller is
> > __this_cpu_preempt_check+0x2b/0x30 [5.815654] CPU: 0 PID: 1
> > Comm: swapper/0 Not tainted
> > 3.14.0-rc5-next-20140306-joshc-08290-g0ffb2fe #1 [5.819553]
> > Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> > 900X3C/900X3D/900X3E/900X4C/900X4D/NP900X3E-A02US, BIOS P07ABK
> > 04/09/2013 [5.823558]  8801182157c0 880118215790
> > 81a64cec  [5.827177]  8801182157b0
> > 81462360 8800c3d553e0 ea00030f5500 [5.830744]
> > 8801182157e8 814623bb 635f736968745f5f 29286464615f7570
> > [5.834134] Call Trace: [5.836848]  []
> > dump_stack+0x4e/0x7a [5.839943]  []
> > check_preemption_disabled+0xd0/0xe0 [5.842997]
> > [] __this_cpu_preempt_check+0x2b/0x30
> > [5.846022]  [] __slab_free+0x38/0x590
> > [5.848863]  [] ? get_parent_ip+0xd/0x50
> > [5.850467] BUG: using __this_cpu_add() in preemptible
> > [] code: khubd/36 [5.850472] caller is
> > __this_cpu_preempt_check+0x2b/0x30 [5.859125]
> > [] ? preempt_count_sub+0x6b/0xf0 [5.862521]
> > [] ? _raw_spin_unlock_irqrestore+0x4a/0x80
> > [5.865599]  [] ?
> > __debug_check_no_obj_freed+0x13e/0x240 [5.868738]
> > [] ? __this_cpu_preempt_check+0x2b/0x30
> > [5.871799]  [] kfree+0x2f7/0x300
> 
> FWIW, it looks like the magic combination of options are:
>   - CONFIG_DEBUG_PREEMPT=y
>   - CONFIG_SLUB=y
>   - CONFIG_SLUB_STATS=y
> 
> Looks like the new percpu() checks are complaining about SLUB's use of
> __this_cpu_inc() for maintaining it's stat counters.  The below patch
> seems to fix it.
> 
> Although, I'm wondering how exact these statistics need to be.  Is
> making them preemption safe even a concern?
> 

Looks like there is a similar issue in touch_softlockup_watchdog too:

Apr 14 14:56:01 localhost kernel: BUG: using __this_cpu_write() in
preemptible [] code: systemd-udevd/1307
Apr 14 14:56:01 localhost kernel: caller is
touch_softlockup_watchdog+0x11/0x1f
Apr 14 14:56:01 localhost kernel: CPU: 0 PID: 1307 Comm: systemd-udevd
Tainted: GW 3.15.0-rc1 #44
Apr 14 14:56:01 localhost kernel: Hardware name: Hewlett-Packard HP
ProBook 6450b/146D, BIOS 68CDE Ver. F.23 06/13/2012
Apr 14 14:56:01 localhost kernel:  815b6385
 813005a4
Apr 14 14:56:01 localhost kernel:  0032
03e8 810c63bc
Apr 14 14:56:01 localhost kernel: 81332592 8800b4ea8800
 8800b686e030
Apr 14 14:56:01 localhost kernel: Call Trace:
Apr 14 14:56:01 localhost kernel: [] ?
dump_stack+0x4a/0x75
Apr 14 14:56:01 localhost kernel: [] ?
check_preemption_disabled+0xd6/0xe5
Apr 14 14:56:01 localhost kernel: [] ?
touch_softlockup_watchdog+0x11/0x1f
Apr 14 14:56:01 localhost kernel: [] ?
acpi_os_stall+0x2f/0x36
Apr 14 14:56:01 localhost kernel: [] ?
acpi_ex_system_do_stall+0x34/0x37
Apr 14 14:56:01 localhost kernel: [] ?
acpi_ds_exec_end_op+0xcc/0x3d5
Apr 14 14:56:01 localhost kernel: [] ?
acpi_ps_parse_loop+0x50c/0x564
Apr 14 14:56:01 localhost kernel: [] ?
acpi_ps_parse_aml+0x93/0x26f
Apr 14 14:56:01 localhost kernel: [] ?
acpi_ps_execute_method+0x1b6/0x25f
Apr 14 14:56:01 localhost kernel: [] ?
acpi_ns_evaluate+0x1ba/0x247
Apr 14 14:56:01 localhost kernel: [] ?
acpi_evaluate_object+0x122/0x231
Apr 14 14:56:01 localhost kernel: [] ?
lis3lv02d_acpi_init+0x1c/0x27 [hp_accel]
Apr 14 14:56:01 localhost kernel: [] ?
lis3lv02d_poweron+0xe/0xca [lis3lv02d]
Apr 14 14:56:01 localhost kernel: [] ?
lis3lv02d_init_device+0x22a/0x4e5 [lis3lv02d]
Apr 14 14:56:01 localhost kernel: [] ?
lis3lv02d_add+0x10c/0x18a [hp_accel]
Apr 14 14:56:01 localhost kernel: [] ?
acpi_device_probe+0x3d/0xeb
Apr 14 14:56:01 localhost kernel: [] ?
driver_probe_device+0x97/0x1b8
Apr 14 14:56:01 localhost kernel: [] ?
__driver_attach+0x58/0x78
Apr 14 14:56:01 localhost kernel: [] ?
__device_attach+0x36/0x36
Apr 14 14:56:01 localhost kernel: [] ?
bus_for_each_dev+0x73/0x7d
Apr 14 14:56:01 localhost kernel: [] ?
bus_add_driver+0x105/0x1ce
Apr 14 14:56:01 localhost kernel: [] ?
driver_register+0x88/0xc0
Apr 14 14:56:01 localhost kernel: [] ?
0xa005efff
Apr 14 14:56:01 localhost kernel: [] ?
do_one_initcall+0x7d/0x101
Apr 14 14:56:01 localhost kernel: [] ?
notifier_call_chain+0x37/0x57
Apr 14 14:56:01 localhost kernel: [] ?
__blocking_notifier_call_chain+0x53/0x60
Apr 14 14:56:01 localhost kernel: [] ?
load_module+0x19f6/0x1ba7
Apr 14 14:56:01 localhost kernel: [] ?
module_flags+0x74/0x74
Apr 14 14:56:01 localhost kernel: [] ?
SyS_finit_module+0x4f/0x63

Re: [PATCH 5/5] x86, AMD: simplify load_microcode_amd() to fix early microcode loading to no longer access uninitialized per-cpu data

2013-08-12 Thread Johannes Hirte
On Thu, 8 Aug 2013 21:29:43 +0200
Borislav Petkov  wrote:

> On Wed, Jul 24, 2013 at 09:32:50PM +0200, Borislav Petkov wrote:
> > Btw, this patch is the one that fixes the boot issue on your box,
> > correct?
> >
> > If so, please put a minimal version of it in the next patch set
> > you're sending right after
> 
> Ok, I've wiggled it ontop of the cpu_has_amd_erratum() patch, see
> below. Can you guys - Torsten and Johannes - run this branch here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git amd-ucode
> 
> to check whether you can boot your boxes with he failing configs? I
> mean, I tried reproducing with your configs on my F10h box and no
> luck - box boots just fine. And it is affected by E400 so...
> 
> In any case, the two patches make sense regardless so if they fix your
> issues, I'd like to send them upwards soonish.
> 
> Thanks.

Just tested your amd-ucode branch with CONFIG_MICROCODE_AMD_EARLY
re-enabled and it works so far.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nouveau oops in nouveau_bo_new

2013-07-29 Thread Johannes Hirte
On Mon, 29 Jul 2013 08:56:16 +0200
Johannes Hirte  wrote:

> On Thu, 25 Jul 2013 12:22:03 -0400
> Dave Jones  wrote:
> 
> > This recently started happening (since the last DRM merge, 3.10 was
> > fine).
> > 
> > [   17.751970] Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [   17.752260] Modules linked in: nouveau(+) video mxm_wmi wmi
> > i2c_algo_bit tg3 ttm drm_kms_helper iTCO_wdt drm iTCO_vendor_support
> > ptp lpc_ich ppdev dcdbas mfd_core pps_core serio_raw i5k_amb
> > i2c_i801 pcspkr i2c_core shpchp parport_pc parport xfs libcrc32c
> > raid0 floppy [   17.753911] CPU: 3 PID: 292 Comm: systemd-udevd Not
> > tainted 3.11.0-rc2+ #13 [   17.754123] Hardware name: Dell
> > Inc. Precision WorkStation 490/0DT031, BIOS A08
> > 04/25/2008 [   17.754285] task: ed9dabc0 ti: ecdd task.ti:
> > ecdd [   17.754392] EIP: 0060:[] EFLAGS: 00010296 CPU:
> > 3 [   17.754542] EIP is at nouveau_bo_new+0x1f/0x28c [nouveau]
> > [   17.754647] EAX:  EBX: ed8610b0 ECX: 0100 EDX:
> > 4000 [   17.754753] ESI: 4000 EDI: ec8390b0 EBP: ecdd1b4c
> > ESP: ecdd1b14 [   17.754858]  DS: 007b ES: 007b FS: 00d8 GS: 00e0
> > SS: 0068 [   17.754963] CR0: 80050033 CR2:  CR3: 2ca65000
> > CR4: 07f0 [   17.755069] Stack: [   17.755167]  0003
> > f1002780 08b0 ecdd f850280f 0600 80d0 ed8610b0
> > [   17.755690] 0100 ef9f3cd0  ed8610b0 ef9f3cd0
> >  ecdd1b78 f86ef2f6 [   17.756210]  0004 
> >   ed861444 ecef8c30 ebb41520 ef9f3cd0
> > [   17.756729] Call Trace: [   17.756849] [] ?
> > drm_mode_crtc_set_gamma_size+0x23/0x43 [drm] [   17.756993]
> > [] nv04_crtc_create+0xd4/0x142 [nouveau] [   17.757138]
> > [] nv04_display_create+0xf2/0x35a [nouveau]
> > [   17.757281]  [] nouveau_display_create+0x33f/0x553
> > [nouveau] [   17.757422]  [] nouveau_drm_load+0x22f/0x5dc
> > [nouveau] [   17.757534]  [] ? device_register+0x17/0x1a
> > [   17.757648]  [] ? drm_sysfs_device_add+0x76/0xa3 [drm]
> > [   17.757764]  [] drm_get_pci_dev+0x138/0x238 [drm]
> > [   17.757902]  [] ? nouveau_device_create_+0x65/0x11b
> > [nouveau] [   17.758044]  []
> > nouveau_drm_probe+0x2d9/0x360 [nouveau] [   17.758155]
> > [] pci_device_probe+0x6c/0xb0 [   17.758261]
> > [] driver_probe_device+0x7f/0x356 [   17.758367]
> > [] __driver_attach+0x74/0x76 [   17.758473]
> > [] ? __device_attach+0x33/0x33 [   17.758579]
> > [] bus_for_each_dev+0x49/0x74 [   17.758684]
> > [] driver_attach+0x1e/0x20 [   17.758791] [] ?
> > __device_attach+0x33/0x33 [   17.758896]  []
> > bus_add_driver+0x1d0/0x27c [   17.759002]  [] ?
> > pci_pm_suspend+0x111/0x111 [   17.759109]  [] ?
> > pci_pm_suspend+0x111/0x111 [   17.759215]  []
> > driver_register+0x6a/0x123 [   17.759321]  [] ?
> > __raw_spin_lock_init+0x2d/0x4e [   17.759428]  []
> > __pci_register_driver+0x4a/0x4d [   17.760008]  []
> > drm_pci_init+0xe6/0xee [drm] [   17.760008]  [] ?
> > 0xf8751fff [   17.760008]  []
> > nouveau_drm_init+0x48/0x1000 [nouveau] [   17.760008]  []
> > do_one_initcall+0xc0/0x180 [   17.760008]  [] ?
> > 0xf8751fff [   17.760008] [] ? set_memory_nx+0x5a/0x5c
> > [   17.760008]  [] ? set_section_ro_nx+0x54/0x59
> > [   17.760008]  [] load_module+0x1ad6/0x2519
> > [   17.760008]  [] ?
> > copy_module_from_fd.isra.49+0x34/0x13b [   17.760008]  []
> > SyS_finit_module+0x73/0xac [   17.760008]  [] ?
> > up_write+0x1b/0x30 [   17.760008]  [] ?
> > vm_mmap_pgoff+0x7a/0x97 [   17.760008]  []
> > sysenter_do_call+0x12/0x32 [   17.760008] Code: c7 83 1c 01 00 00
> > ff ff ff ff eb aa 55 89 e5 57 56 53 83 ec 2c 66 66 66 66 90 89 d6
> > 89 4d e8 8b b8 ec 03 00 00 8b 87 8c 00 00 00 <8b> 00 0f b6 88 91 00
> > 00 00 b8 ff ff ff ff d3 e0 25 ff ff ff 7f [   17.760008] EIP:
> > [] nouveau_bo_new+0x1f/0x28c [nouveau] SS:ESP
> > 0068:ecdd1b14 [   17.760008] CR2: 
> 
> I've seen a similar oops. Bisect pointed me to
> 0108bc808107b97e101b15af9705729626be6447 "drm/nouveau: do not allow
> negative sizes for now" and reverting this commit fixed it for me.

Forgot CC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nouveau oops in nouveau_bo_new

2013-07-28 Thread Johannes Hirte
On Thu, 25 Jul 2013 12:22:03 -0400
Dave Jones  wrote:

> This recently started happening (since the last DRM merge, 3.10 was
> fine).
> 
> [   17.751970] Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [   17.752260] Modules linked in: nouveau(+) video mxm_wmi wmi
> i2c_algo_bit tg3 ttm drm_kms_helper iTCO_wdt drm iTCO_vendor_support
> ptp lpc_ich ppdev dcdbas mfd_core pps_core serio_raw i5k_amb i2c_i801
> pcspkr i2c_core shpchp parport_pc parport xfs libcrc32c raid0 floppy
> [   17.753911] CPU: 3 PID: 292 Comm: systemd-udevd Not tainted
> 3.11.0-rc2+ #13 [   17.754123] Hardware name: Dell
> Inc. Precision WorkStation 490/0DT031, BIOS A08
> 04/25/2008 [   17.754285] task: ed9dabc0 ti: ecdd task.ti:
> ecdd [   17.754392] EIP: 0060:[] EFLAGS: 00010296 CPU:
> 3 [   17.754542] EIP is at nouveau_bo_new+0x1f/0x28c [nouveau]
> [   17.754647] EAX:  EBX: ed8610b0 ECX: 0100 EDX:
> 4000 [   17.754753] ESI: 4000 EDI: ec8390b0 EBP: ecdd1b4c
> ESP: ecdd1b14 [   17.754858]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS:
> 0068 [   17.754963] CR0: 80050033 CR2:  CR3: 2ca65000 CR4:
> 07f0 [   17.755069] Stack: [   17.755167]  0003 f1002780
> 08b0 ecdd f850280f 0600 80d0 ed8610b0 [   17.755690]
> 0100 ef9f3cd0  ed8610b0 ef9f3cd0  ecdd1b78
> f86ef2f6 [   17.756210]  0004    ed861444
> ecef8c30 ebb41520 ef9f3cd0 [   17.756729] Call Trace: [   17.756849]
> [] ? drm_mode_crtc_set_gamma_size+0x23/0x43 [drm]
> [   17.756993]  [] nv04_crtc_create+0xd4/0x142 [nouveau]
> [   17.757138]  [] nv04_display_create+0xf2/0x35a [nouveau]
> [   17.757281]  [] nouveau_display_create+0x33f/0x553
> [nouveau] [   17.757422]  [] nouveau_drm_load+0x22f/0x5dc
> [nouveau] [   17.757534]  [] ? device_register+0x17/0x1a
> [   17.757648]  [] ? drm_sysfs_device_add+0x76/0xa3 [drm]
> [   17.757764]  [] drm_get_pci_dev+0x138/0x238 [drm]
> [   17.757902]  [] ? nouveau_device_create_+0x65/0x11b
> [nouveau] [   17.758044]  [] nouveau_drm_probe+0x2d9/0x360
> [nouveau] [   17.758155]  [] pci_device_probe+0x6c/0xb0
> [   17.758261]  [] driver_probe_device+0x7f/0x356
> [   17.758367]  [] __driver_attach+0x74/0x76
> [   17.758473]  [] ? __device_attach+0x33/0x33
> [   17.758579]  [] bus_for_each_dev+0x49/0x74
> [   17.758684]  [] driver_attach+0x1e/0x20 [   17.758791]
> [] ? __device_attach+0x33/0x33 [   17.758896]  []
> bus_add_driver+0x1d0/0x27c [   17.759002]  [] ?
> pci_pm_suspend+0x111/0x111 [   17.759109]  [] ?
> pci_pm_suspend+0x111/0x111 [   17.759215]  []
> driver_register+0x6a/0x123 [   17.759321]  [] ?
> __raw_spin_lock_init+0x2d/0x4e [   17.759428]  []
> __pci_register_driver+0x4a/0x4d [   17.760008]  []
> drm_pci_init+0xe6/0xee [drm] [   17.760008]  [] ?
> 0xf8751fff [   17.760008]  [] nouveau_drm_init+0x48/0x1000
> [nouveau] [   17.760008]  [] do_one_initcall+0xc0/0x180
> [   17.760008]  [] ? 0xf8751fff [   17.760008]
> [] ? set_memory_nx+0x5a/0x5c [   17.760008]  [] ?
> set_section_ro_nx+0x54/0x59 [   17.760008]  []
> load_module+0x1ad6/0x2519 [   17.760008]  [] ?
> copy_module_from_fd.isra.49+0x34/0x13b [   17.760008]  []
> SyS_finit_module+0x73/0xac [   17.760008]  [] ?
> up_write+0x1b/0x30 [   17.760008]  [] ?
> vm_mmap_pgoff+0x7a/0x97 [   17.760008]  []
> sysenter_do_call+0x12/0x32 [   17.760008] Code: c7 83 1c 01 00 00 ff
> ff ff ff eb aa 55 89 e5 57 56 53 83 ec 2c 66 66 66 66 90 89 d6 89 4d
> e8 8b b8 ec 03 00 00 8b 87 8c 00 00 00 <8b> 00 0f b6 88 91 00 00 00
> b8 ff ff ff ff d3 e0 25 ff ff ff 7f [   17.760008] EIP: []
> nouveau_bo_new+0x1f/0x28c [nouveau] SS:ESP 0068:ecdd1b14
> [   17.760008] CR2: 

I've seen a similar oops. Bisect pointed me to
0108bc808107b97e101b15af9705729626be6447 "drm/nouveau: do not allow
negative sizes for now" and reverting this commit fixed it for me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: early microcode on amd is broken when no initramfs provided

2013-07-21 Thread Johannes Hirte
On Sun, 21 Jul 2013 00:59:11 +0200
Borislav Petkov  wrote:

> On Sat, Jul 20, 2013 at 09:01:33PM +0200, Torsten Kaiser wrote:
> > On Tue, Jul 16, 2013 at 7:00 PM, Borislav Petkov 
> > wrote:
> > > On Thu, Jul 11, 2013 at 11:05:25PM +0200, Johannes Hirte wrote:
> > >> config is attached
> > >
> > > Ok, I can reproduce the hang with your config but even with:
> > >
> > > $ grep MICROCODE .config
> > > # CONFIG_MICROCODE is not set
> > > # CONFIG_MICROCODE_INTEL_EARLY is not set
> > > # CONFIG_MICROCODE_AMD_EARLY is not set
> > >
> > > which means, it cannot be microcode-related.
> > >
> > > And I'd bet if you wait a minute (yep, it should be exactly 60
> > > seconds) the boot would probably continue. And if so, this is
> > > that 60 sec delay where the kernel tries to find firmware.
> > >
> > > Hmm...
> > 
> > I have the same problem: Booting 3.11-rc1 hangs after the line:
> > ACPI: Executed 3 blocks of module-level executable AML code
> > 
> > I bisected it down to the early microcode changes:
> > 757885e94a22bcc82beb9b1445c95218cb20ceab (the new early loading
> > implementation) and 6b3389ac21b5e557b957f1497d0ff22bf733e8c3 (small
> > fixup) completely fail to boot (No output beyond "Booting kernel") ,
> > from 275bbe2e299f1820ec8faa443d689469a9e6ecc5 ("Make
> > find_ucode_in_initrd() __init") I'm seeing this hang.
> > 
> > Just turning CONFIG_MICROCODE_EARLY off solves the problem: The
> > system now sucessfully boots 3.11-rc1.
> 
> Ok, I need to be able to reproduce that first - I wasn't that
> successful with Johannes' setup.

Strange, I've bisected to the same commit with the config I've send you.

> So, can you please send .config and how you're loading your microcode?
> Is it in the initrd or are you doing that later, how? Grub entry
> please.
> 
> Also, is it just plain v3.11-rc1 or with patches ontop?
> 
> Also, /proc/cpuinfo please.
>
> Thanks.

/proc/cpuinfo:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 16
model   : 6
model name  : AMD Athlon(tm) II P320 Dual-Core Processor
stepping: 3
microcode   : 0x1b6
cpu MHz : 800.000
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm
extapic cr8_legacy abm sse4a 3dnowprefetch osvw ibs skinit wdt
nodeid_msr hw_pstate npt lbrv svm_lock nrip_save bogomips:
4189.33 TLB size: 1024 4K pages clflush size: 64
cache_alignment : 64 address sizes   : 48 bits physical, 48 bits
virtual power management: ts ttp tm stc 100mhzsteps hwpstate

processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 16
model   : 6
model name  : AMD Athlon(tm) II P320 Dual-Core Processor
stepping: 3
microcode   : 0x1b6
cpu MHz : 800.000
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm
extapic cr8_legacy abm sse4a 3dnowprefetch osvw ibs skinit wdt
nodeid_msr hw_pstate npt lbrv svm_lock nrip_save bogomips:
4189.33 TLB size: 1024 4K pages clflush size: 64
cache_alignment : 64 address sizes   : 48 bits physical, 48 bits
virtual power management: ts ttp tm stc 100mhzsteps hwpstate

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: early microcode on amd is broken when no initramfs provided

2013-07-11 Thread Johannes Hirte
On Wed, 10 Jul 2013 09:30:49 +0200
Borislav Petkov  wrote:

> On Tue, Jul 09, 2013 at 06:36:01PM +0200, Johannes Hirte wrote:
> > When CONFIG_MICROCODE_EARLY is enabled on AMD but no initramfs is
> > provided in the bootmanager (grub2), the system hangs here:
> 
> Hmm, I can't reproduce it here.
> 
> grub2 entry is:
> 
> menuentry 'Debian GNU/Linux, with Linux 3.10.0+' --class debian
> --class gnu-linux --class gnu --class os { load_video
> insmod gzio
> insmod part_msdos
> insmod ext2
> set root='(hd0,msdos1)'
> search --no-floppy --fs-uuid --set=root
> adbbd17b-6e04-4458-814f-9a2b75a4d91e echo'Loading Linux
> 3.10.0+ ...' linux   /boot/vmlinuz-3.10.0+ root=/dev/sda1 ro
> resume=/dev/sda2 ignore_loglevel }
> 
> Kernel is: v3.10-8587-g496322b
> 
> .config settings are:
> 
> $ zgrep -E "(INITRD|MICROCODE)" /proc/config.gz
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_MICROCODE=y
> CONFIG_MICROCODE_INTEL=y
> CONFIG_MICROCODE_AMD=y
> CONFIG_MICROCODE_OLD_INTERFACE=y
> CONFIG_MICROCODE_INTEL_LIB=y
> CONFIG_MICROCODE_INTEL_EARLY=y
> CONFIG_MICROCODE_AMD_EARLY=y
> CONFIG_MICROCODE_EARLY=y
> # CONFIG_ACPI_INITRD_TABLE_OVERRIDE is not set
> 
> Can you send me your .config and your grub entry please?
> 
> Thanks.
> 

grub entry:

menuentry 'Gentoo GNU/Linux 3.10.0-09080-g19d2f8e' --class gentoo
--class gnu-linux --class gnu --class os $menuentry_id_option
'gnulinux-simple-d044ac73-1dd2-4250-b864-5cb25fd67192' { load_video
insmod gzio insmod part_msdos
insmod btrfs
set root='hd0,msdos3'
if [ x$feature_platform_search_hint = xy ]; then
  search --no-floppy --fs-uuid --set=root
--hint-bios=hd0,msdos3 --hint-efi=hd0,msdos3
--hint-baremetal=ahci0,msdos3  c684c3ff-5bac-4ba8-8f63-c9036c2ad233
else search --no-floppy --fs-uuid --set=root
c684c3ff-5bac-4ba8-8f63-c9036c2ad233 fi echo'Linux
3.10.0-09080-g19d2f8e wird geladen …'
linux   /vmlinuz-3.10.0-09080-g19d2f8e root=/dev/sda1 ro
pcie_aspm=force radeon.dpm=1 }

config is attached


config.bz2
Description: application/bzip


early microcode on amd is broken when no initramfs provided

2013-07-09 Thread Johannes Hirte
When CONFIG_MICROCODE_EARLY is enabled on AMD but no initramfs is provided in 
the
bootmanager (grub2), the system hangs here:

[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.10.0-06005-gd2b4a64 (puck@acer) (gcc version 
4.8.1 (Gentoo 4.8.1 p1.0, pie-0.5.6) ) #69 SMP PREEMPT Tue Jul 9 18:22:09 CEST 
2013
[0.00] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-06005-gd2b4a64 
root=/dev/sda1 ro pcie_aspm=force radeon.dpm=1
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable
[0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xde555fff] usable
[0.00] BIOS-e820: [mem 0xde556000-0xde755fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xde756000-0xdfd3efff] usable
[0.00] BIOS-e820: [mem 0xdfd3f000-0xdfdbefff] reserved
[0.00] BIOS-e820: [mem 0xdfdbf000-0xdfebefff] ACPI NVS
[0.00] BIOS-e820: [mem 0xdfebf000-0xdfef6fff] ACPI data
[0.00] BIOS-e820: [mem 0xdfef7000-0xdfef] usable
[0.00] BIOS-e820: [mem 0xdff0-0xdfff] reserved
[0.00] BIOS-e820: [mem 0xf700-0xf7ff] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xffe0-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00011fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.6 present.
[0.00] DMI: Packard Bell EasyNote TK81/SJV52_DN, BIOS V2.14 07/27/2011
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] No AGP bridge found
[0.00] e820: last_pfn = 0x12 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-F write-through
[0.00] MTRR variable ranges enabled:


[0.00]   0 base  mask 8000 write-back   


[0.00]   1 base 8000 mask C000 write-back
[0.00]   2 base C000 mask E000 write-back
[0.00]   3 base FFE0 mask FFE0 write-protect
[0.00]   4 base 0001 mask E000 write-back
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] TOM2: 00012000 aka 4608M
[0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[0.00] e820: last_pfn = 0xdff00 max_arch_pfn = 0x4
[0.00] Scanning 1 areas for low memory corruption
[0.00] Base memory trampoline at [88098000] 98000 size 28672
[0.00] Using GB pages for direct mapping
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000f] page 4k
[0.00] BRK [0x01c08000, 0x01c08fff] PGTABLE
[0.00] BRK [0x01c09000, 0x01c09fff] PGTABLE
[0.00] BRK [0x01c0a000, 0x01c0afff] PGTABLE
[0.00] init_memory_mapping: [mem 0x11fe0-0x11fff]
[0.00]  [mem 0x11fe0-0x11fff] page 2M
[0.00] BRK [0x01c0b000, 0x01c0bfff] PGTABLE
[0.00] init_memory_mapping: [mem 0x11c00-0x11fdf]
[0.00]  [mem 0x11c00-0x11fdf] page 2M
[0.00] init_memory_mapping: [mem 0x1-0x11bff]
[0.00]  [mem 0x1-0x11bff] page 2M
[0.00] init_memory_mapping: [mem 0x0010-0xde555fff]
[0.00]  [mem 0x0010-0x001f] page 4k
[0.00]  [mem 0x0020-0x3fff] page 2M
[0.00]  [mem 0x4000-0xbfff] page 1G
[0.00]  [mem 0xc000-0xde3f] page 2M
[0.00]  [mem 0xde40-0xde555fff] page 4k
[0.00] init_memory_mapping: [mem 0xde756000-0xdfd3efff]
[0.00]  [mem 0xde756000-0xde7f] page 4k
[0.00]  [mem 0xde80-0xdfbf] page 2M
[0.00]  [mem 0xdfc0-0xdfd3efff] page 4k
[0.00] BRK [0x01c0c000, 0x01c0cfff] PGTABLE
[0.00] init_memory_mapping: [

Re: [PATCH] mm,vmscan: only loop back if compaction would fail in all zones

2012-11-26 Thread Johannes Hirte
Am Sun, 25 Nov 2012 23:10:41 -0500
schrieb Johannes Weiner :

> On Sun, Nov 25, 2012 at 10:15:18PM -0500, Johannes Weiner wrote:
> > On Sun, Nov 25, 2012 at 07:16:45PM -0500, Rik van Riel wrote:
> > > On Sun, 25 Nov 2012 17:44:33 -0500
> > > Johannes Weiner  wrote:
> > > > On Sun, Nov 25, 2012 at 01:29:50PM -0500, Rik van Riel wrote:
> > > 
> > > > > Could you try this patch?
> > > > 
> > > > It's not quite enough because it's not reaching the conditions
> > > > you changed, see analysis in
> > > > https://lkml.org/lkml/2012/11/20/567
> > > 
> > > Johannes,
> > > 
> > > does the patch below fix your problem?
> > 
> > I can not reproduce the problem anymore with my smoke test.
> > 
> > > I suspect it would, because kswapd should only ever run into this
> > > particular problem when we have a tiny memory zone in a pgdat,
> > > and in that case we will also have a larger zone nearby, where
> > > compaction would just succeed.
> > 
> > What if there is a higher order GFP_DMA allocation when the other
> > zones in the system meet the high watermark for this order?
> > 
> > There is something else that worries me: if the preliminary zone
> > scan finds the high watermark of all zones alright, end_zone is at
> > its initialization value, 0.  The final compaction loop at `if
> > (order)' goes through all zones up to and including end_zone, which
> > was never really set to anything meaningful(?) and the only zone
> > considered is the DMA zone again.  Very unlikely, granted, but if
> > you'd ever hit that race and kswapd gets stuck, this will be fun to
> > debug...
> 
> I actually liked your first idea better: force reclaim until the
> compaction watermark is met.  The only problem was that still not
> every check in there agreed when the zone was considered balanced and
> so no actual reclaim happened.
> 
> So how about making everybody agree?  If the high watermark is met but
> not the compaction one, keep doing reclaim AND don't consider the zone
> balanced, AND don't make it contribute to balanced_pages etc.?  This
> makes sure reclaim really does not bail and that the node is never
> considered alright when it's actually not according to compaction.
> This patch fixes the problem too (at least for the smoke test so far)
> and IMO makes the code a bit more understandable.
> 
> We may be able to drop some of the relooping conditions.  We may also
> be able to reduce the pressure from the DMA zone by passing the right
> classzone_idx in there.  Needs more thought.
> 
> ---
> From: Johannes Weiner 
> Subject: [patch] mm: vmscan: fix endless loop in kswapd balancing
> 
> Kswapd does not in all places have the same criteria for when it
> considers a zone balanced.  This leads to zones being not reclaimed
> because they are considered just fine and the compaction checks to
> loop over the zonelist again because they are considered unbalanced,
> causing kswapd to run forever.
> 
> Add a function, zone_balanced(), that checks the watermark and if
> compaction has enough free memory to do its job.  Then use it
> uniformly for when kswapd needs to check if a zone is balanced.
> 
> Signed-off-by: Johannes Weiner 
> ---
>  mm/vmscan.c | 27 ++-
>  1 file changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 48550c6..3b0aef4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2397,6 +2397,19 @@ static void age_active_anon(struct zone *zone,
> struct scan_control *sc) } while (memcg);
>  }
>  
> +static bool zone_balanced(struct zone *zone, int order,
> +   unsigned long balance_gap, int
> classzone_idx) +{
> + if (!zone_watermark_ok_safe(zone, order,
> high_wmark_pages(zone) +
> + balance_gap, classzone_idx, 0))
> + return false;
> +
> + if (COMPACTION_BUILD && order && !compaction_suitable(zone,
> order))
> + return false;
> +
> + return true;
> +}
> +
>  /*
>   * pgdat_balanced is used when checking if a node is balanced for
> high-order
>   * allocations. Only zones that meet watermarks and are in a zone
> allowed @@ -2475,8 +2488,7 @@ static bool
> prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
> continue; }
>  
> - if (!zone_watermark_ok_safe(zone, order,
> high_wmark_pages(zone),
> - i, 0))
> + if (!zone_balanced(zone, order, 0, i))
>   all_zones_ok = false;
>   else
>   balanced += zone->present_pages;
> @@ -2585,8 +2597,7 @@ static unsigned long balance_pgdat(pg_data_t
> *pgdat, int order, break;
>   }
>  
> - if (!zone_watermark_ok_safe(zone, order,
> - high_wmark_pages(zone), 0,
> 0)) {
> + if (!zone_balanced(zone, order, 0, 0)) {
>   end_zone = i;
>   break;
>  

Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"

2012-11-14 Thread Johannes Hirte
Am Fri, 9 Nov 2012 08:36:37 +
schrieb Mel Gorman :

> On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote:
> > Am Mon, 5 Nov 2012 14:24:49 +
> > schrieb Mel Gorman :
> > 
> > > Jiri Slaby reported the following:
> > > 
> > >   (It's an effective revert of "mm: vmscan: scale number of
> > > pages reclaimed by reclaim/compaction based on failures".) Given
> > > kswapd had hours of runtime in ps/top output yesterday in the
> > > morning and after the revert it's now 2 minutes in sum for the
> > > last 24h, I would say, it's gone.
> > > 
> > > The intention of the patch in question was to compensate for the
> > > loss of lumpy reclaim. Part of the reason lumpy reclaim worked is
> > > because it aggressively reclaimed pages and this patch was meant
> > > to be a sane compromise.
> > > 
> > > When compaction fails, it gets deferred and both compaction and
> > > reclaim/compaction is deferred avoid excessive reclaim. However,
> > > since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is
> > > woken up each time and continues reclaiming which was not taken
> > > into account when the patch was developed.
> > > 
> > > Attempts to address the problem ended up just changing the shape
> > > of the problem instead of fixing it. The release window gets
> > > closer and while a THP allocation failing is not a major problem,
> > > kswapd chewing up a lot of CPU is. This patch reverts "mm:
> > > vmscan: scale number of pages reclaimed by reclaim/compaction
> > > based on failures" and will be revisited in the future.
> > > 
> > > Signed-off-by: Mel Gorman 
> > > ---
> > >  mm/vmscan.c |   25 -
> > >  1 file changed, 25 deletions(-)
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 2624edc..e081ee8 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct
> > > scan_control *sc) return false;
> > >  }
> > >  
> > > -#ifdef CONFIG_COMPACTION
> > > -/*
> > > - * If compaction is deferred for sc->order then scale the number
> > > of pages
> > > - * reclaimed based on the number of consecutive allocation
> > > failures
> > > - */
> > > -static unsigned long scale_for_compaction(unsigned long
> > > pages_for_compaction,
> > > - struct lruvec *lruvec, struct
> > > scan_control *sc) -{
> > > - struct zone *zone = lruvec_zone(lruvec);
> > > -
> > > - if (zone->compact_order_failed <= sc->order)
> > > - pages_for_compaction <<=
> > > zone->compact_defer_shift;
> > > - return pages_for_compaction;
> > > -}
> > > -#else
> > > -static unsigned long scale_for_compaction(unsigned long
> > > pages_for_compaction,
> > > - struct lruvec *lruvec, struct
> > > scan_control *sc) -{
> > > - return pages_for_compaction;
> > > -}
> > > -#endif
> > > -
> > >  /*
> > >   * Reclaim/compaction is used for high-order allocation
> > > requests. It reclaims
> > >   * order-0 pages before compacting the zone.
> > > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static
> > > inline bool should_continue_reclaim(struct lruvec *lruvec,
> > >* inactive lists are large enough, continue reclaiming
> > >*/
> > >   pages_for_compaction = (2UL << sc->order);
> > > -
> > > - pages_for_compaction =
> > > scale_for_compaction(pages_for_compaction,
> > > - lruvec, sc);
> > >   inactive_lru_pages = get_lru_size(lruvec,
> > > LRU_INACTIVE_FILE); if (nr_swap_pages > 0)
> > >   inactive_lru_pages += get_lru_size(lruvec,
> > > LRU_INACTIVE_ANON); --
> > 
> > Even with this patch I see kswapd0 very often on top. Much more than
> > with kernel 3.6.
> 
> How severe is the CPU usage? The higher usage can be explained by "mm:
> remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to
> reduce the amount of time processes spend in compaction but will
> result in the CPU cost being incurred by kswapd.
> 
> Is it really high like the bug was reporting with high usage over long
> periods of time or do you just see it using 2-6% of CPU for short
> periods?

It is really high. I've seen with compile-jobs (make -j4 on dual
core) kswapd0 consuming at least 50% CPU most time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"

2012-11-06 Thread Johannes Hirte
Am Mon, 5 Nov 2012 14:24:49 +
schrieb Mel Gorman :

> Jiri Slaby reported the following:
> 
>   (It's an effective revert of "mm: vmscan: scale number of
> pages reclaimed by reclaim/compaction based on failures".) Given
> kswapd had hours of runtime in ps/top output yesterday in the morning
>   and after the revert it's now 2 minutes in sum for the last
> 24h, I would say, it's gone.
> 
> The intention of the patch in question was to compensate for the loss
> of lumpy reclaim. Part of the reason lumpy reclaim worked is because
> it aggressively reclaimed pages and this patch was meant to be a sane
> compromise.
> 
> When compaction fails, it gets deferred and both compaction and
> reclaim/compaction is deferred avoid excessive reclaim. However, since
> commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each
> time and continues reclaiming which was not taken into account when
> the patch was developed.
> 
> Attempts to address the problem ended up just changing the shape of
> the problem instead of fixing it. The release window gets closer and
> while a THP allocation failing is not a major problem, kswapd chewing
> up a lot of CPU is. This patch reverts "mm: vmscan: scale number of
> pages reclaimed by reclaim/compaction based on failures" and will be
> revisited in the future.
> 
> Signed-off-by: Mel Gorman 
> ---
>  mm/vmscan.c |   25 -
>  1 file changed, 25 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2624edc..e081ee8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct
> scan_control *sc) return false;
>  }
>  
> -#ifdef CONFIG_COMPACTION
> -/*
> - * If compaction is deferred for sc->order then scale the number of
> pages
> - * reclaimed based on the number of consecutive allocation failures
> - */
> -static unsigned long scale_for_compaction(unsigned long
> pages_for_compaction,
> - struct lruvec *lruvec, struct scan_control
> *sc) -{
> - struct zone *zone = lruvec_zone(lruvec);
> -
> - if (zone->compact_order_failed <= sc->order)
> - pages_for_compaction <<= zone->compact_defer_shift;
> - return pages_for_compaction;
> -}
> -#else
> -static unsigned long scale_for_compaction(unsigned long
> pages_for_compaction,
> - struct lruvec *lruvec, struct scan_control
> *sc) -{
> - return pages_for_compaction;
> -}
> -#endif
> -
>  /*
>   * Reclaim/compaction is used for high-order allocation requests. It
> reclaims
>   * order-0 pages before compacting the zone.
> should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline
> bool should_continue_reclaim(struct lruvec *lruvec,
>* inactive lists are large enough, continue reclaiming
>*/
>   pages_for_compaction = (2UL << sc->order);
> -
> - pages_for_compaction =
> scale_for_compaction(pages_for_compaction,
> - lruvec, sc);
>   inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE);
>   if (nr_swap_pages > 0)
>   inactive_lru_pages += get_lru_size(lruvec,
> LRU_INACTIVE_ANON); --

Even with this patch I see kswapd0 very often on top. Much more than
with kernel 3.6.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/