from:"Holger Kiehl"

Re: 3.16.49 Oops, does not boot on two socket server

2017-12-12 Thread Holger Kiehl

Hello,

just want to give a follow up. I have tested this with 3.16.51 and the
problem still exists. It seems the 3.16.x tree is no longer usable
for two socket servers :-(

Regards,
Holger

PS: here the panic with 3.16.51:

smpboot: Total of 24 processors activated (95963.71 BogoMIPS)
[ cut here ]
WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5811 
init_overlap_sched_group+0x114/0x120()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.51-1.el6.x86_64 #1
Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
  880fe96c7da8 815432dc 
 16b3 880fe96c7de8 8104cc72 880fff803c00
 880fe8d05650 881fe96ba3a8 880fe96af540 
Call Trace:
 [] dump_stack+0x4e/0x6a
 [] warn_slowpath_common+0x82/0xb0
 [] warn_slowpath_null+0x15/0x20
 [] init_overlap_sched_group+0x114/0x120
 [] build_overlap_sched_groups+0x134/0x1e0
 [] build_sched_domains+0x159/0x330
 [] sched_init_smp+0x65/0xf8
 [] kernel_init_freeable+0xb2/0x12d
 [] ? rest_init+0x80/0x80
 [] kernel_init+0x9/0xf0
 [] ret_from_fork+0x58/0x90
 [] ? rest_init+0x80/0x80
---[ end trace 207206398bdf8ddb ]---
BUG: unable to handle kernel paging request at 01024a7f
IP: [] init_overlap_sched_group+0xae/0x120
PGD 0
Oops:  [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.16.51-1.el6.x86_64 #1
Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
task: 880fe96d ti: 880fe96c4000 task.ti: 880fe96c4000
RIP: 0010:[]  [] 
init_overlap_sched_group+0xae/0x120
RSP: :880fe96c7e08  EFLAGS: 00010246
RAX: 0100 RBX: 880fe8d05650 RCX: 0020
RDX: 00014a80 RSI: 0020 RDI: 0020
RBP: 880fe96c7e28 R08: 880fe96af558 R09: 
R10: 0002 R11: 0001 R12: 881fe96ba3a8
R13: 880fe96af540 R14:  R15: 881fe96ba3a8
FS:  () GS:880fffc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01024a7f CR3: 01714000 CR4: 000407f0
Stack:
    880fe8d05650
 880fe96c7ea8 81079b04 0011 880fe96af540
   cd68 
Call Trace:
 [] build_overlap_sched_groups+0x134/0x1e0
 [] build_sched_domains+0x159/0x330
 [] sched_init_smp+0x65/0xf8
 [] kernel_init_freeable+0xb2/0x12d
 [] ? rest_init+0x80/0x80
 [] kernel_init+0x9/0xf0
 [] ret_from_fork+0x58/0x90
 [] ? rest_init+0x80/0x80
Code: 60 83 00 85 c0 74 70 49 8d 75 18 48 c7 c2 38 f9 8a 81 bf ff ff ff ff e8 
31 f9 1f 00 49 8b 54 24 10 48 98 48 8b 04 c5 a0 fc 78 81 <48> 8b 14 10 b8 01 00 
00 00 49 89 55 10 f0 0f c1 02 85 c0 75 0f
RIP  [] init_overlap_sched_group+0xae/0x120
 RSP 
CR2: 01024a7f
---[ end trace 207206398bdf8ddc ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009


On Wed, 18 Oct 2017, Holger Kiehl wrote:

> Hello,
> 
> just tried to boot 3.16.49 on a 2 socket server and it fails with the
> following error:
> 
>smpboot: Total of 24 processors activated (95818.36 BogoMIPS)
>[ cut here ]
>WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5811 
> init_overlap_sched_group+0x114/0x120()
>Modules linked in:
>CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.49-1.el6.x86_64 #1
>Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
>  880bfd6d3da8 81542f1c 
> 16b3 880bfd6d3de8 8104cd72 880c0f803c00
> 880bfcc69650 8817fd695ca8 880bfd6e2300 
>Call Trace:
> [] dump_stack+0x4e/0x6a
> [] warn_slowpath_common+0x82/0xb0
> [] warn_slowpath_null+0x15/0x20
> [] init_overlap_sched_group+0x114/0x120
> [] build_overlap_sched_groups+0x134/0x1e0
> [] build_sched_domains+0x159/0x330
> [] sched_init_smp+0x65/0xf8
> [] kernel_init_freeable+0xb2/0x12d
> [] ? rest_init+0x80/0x80
> [] kernel_init+0x9/0xf0
> [] ret_from_fork+0x58/0x90
> [] ? rest_init+0x80/0x80
>---[ end trace a491a27c866dd06e ]---
>BUG: unable to handle kernel paging request at 010247bf
>IP: [] init_overlap_sched_group+0xae/0x120
>PGD 0
>Oops:  [#1] SMP
>Modules linked in:
>CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.16.49-1.el6.x86_64 
> #1
>Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
>task: 8817fd6a8000 ti: 880bfd6d task.ti: 880bfd6d
>RIP: 0010:[]  [] 
> init_overlap_sched_group+0xae/0x120
>RSP: :880bfd6d3e08  EFLAGS: 00010246
>RAX: 0100 RBX: 880bfcc69650 RCX:

Re: 3.16.49 Oops, does not boot on two socket server

2017-12-12 Thread Holger Kiehl

Hello,

just want to give a follow up. I have tested this with 3.16.51 and the
problem still exists. It seems the 3.16.x tree is no longer usable
for two socket servers :-(

Regards,
Holger

PS: here the panic with 3.16.51:

smpboot: Total of 24 processors activated (95963.71 BogoMIPS)
[ cut here ]
WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5811 
init_overlap_sched_group+0x114/0x120()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.51-1.el6.x86_64 #1
Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
  880fe96c7da8 815432dc 
 16b3 880fe96c7de8 8104cc72 880fff803c00
 880fe8d05650 881fe96ba3a8 880fe96af540 
Call Trace:
 [] dump_stack+0x4e/0x6a
 [] warn_slowpath_common+0x82/0xb0
 [] warn_slowpath_null+0x15/0x20
 [] init_overlap_sched_group+0x114/0x120
 [] build_overlap_sched_groups+0x134/0x1e0
 [] build_sched_domains+0x159/0x330
 [] sched_init_smp+0x65/0xf8
 [] kernel_init_freeable+0xb2/0x12d
 [] ? rest_init+0x80/0x80
 [] kernel_init+0x9/0xf0
 [] ret_from_fork+0x58/0x90
 [] ? rest_init+0x80/0x80
---[ end trace 207206398bdf8ddb ]---
BUG: unable to handle kernel paging request at 01024a7f
IP: [] init_overlap_sched_group+0xae/0x120
PGD 0
Oops:  [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.16.51-1.el6.x86_64 #1
Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
task: 880fe96d ti: 880fe96c4000 task.ti: 880fe96c4000
RIP: 0010:[]  [] 
init_overlap_sched_group+0xae/0x120
RSP: :880fe96c7e08  EFLAGS: 00010246
RAX: 0100 RBX: 880fe8d05650 RCX: 0020
RDX: 00014a80 RSI: 0020 RDI: 0020
RBP: 880fe96c7e28 R08: 880fe96af558 R09: 
R10: 0002 R11: 0001 R12: 881fe96ba3a8
R13: 880fe96af540 R14:  R15: 881fe96ba3a8
FS:  () GS:880fffc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01024a7f CR3: 01714000 CR4: 000407f0
Stack:
    880fe8d05650
 880fe96c7ea8 81079b04 0011 880fe96af540
   cd68 
Call Trace:
 [] build_overlap_sched_groups+0x134/0x1e0
 [] build_sched_domains+0x159/0x330
 [] sched_init_smp+0x65/0xf8
 [] kernel_init_freeable+0xb2/0x12d
 [] ? rest_init+0x80/0x80
 [] kernel_init+0x9/0xf0
 [] ret_from_fork+0x58/0x90
 [] ? rest_init+0x80/0x80
Code: 60 83 00 85 c0 74 70 49 8d 75 18 48 c7 c2 38 f9 8a 81 bf ff ff ff ff e8 
31 f9 1f 00 49 8b 54 24 10 48 98 48 8b 04 c5 a0 fc 78 81 <48> 8b 14 10 b8 01 00 
00 00 49 89 55 10 f0 0f c1 02 85 c0 75 0f
RIP  [] init_overlap_sched_group+0xae/0x120
 RSP 
CR2: 01024a7f
---[ end trace 207206398bdf8ddc ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009


On Wed, 18 Oct 2017, Holger Kiehl wrote:

> Hello,
> 
> just tried to boot 3.16.49 on a 2 socket server and it fails with the
> following error:
> 
>smpboot: Total of 24 processors activated (95818.36 BogoMIPS)
>[ cut here ]
>WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5811 
> init_overlap_sched_group+0x114/0x120()
>Modules linked in:
>CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.49-1.el6.x86_64 #1
>Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
>  880bfd6d3da8 81542f1c 
> 16b3 880bfd6d3de8 8104cd72 880c0f803c00
> 880bfcc69650 8817fd695ca8 880bfd6e2300 
>Call Trace:
> [] dump_stack+0x4e/0x6a
> [] warn_slowpath_common+0x82/0xb0
> [] warn_slowpath_null+0x15/0x20
> [] init_overlap_sched_group+0x114/0x120
> [] build_overlap_sched_groups+0x134/0x1e0
> [] build_sched_domains+0x159/0x330
> [] sched_init_smp+0x65/0xf8
> [] kernel_init_freeable+0xb2/0x12d
> [] ? rest_init+0x80/0x80
> [] kernel_init+0x9/0xf0
> [] ret_from_fork+0x58/0x90
> [] ? rest_init+0x80/0x80
>---[ end trace a491a27c866dd06e ]---
>BUG: unable to handle kernel paging request at 010247bf
>IP: [] init_overlap_sched_group+0xae/0x120
>PGD 0
>Oops:  [#1] SMP
>Modules linked in:
>CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.16.49-1.el6.x86_64 
> #1
>Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
>task: 8817fd6a8000 ti: 880bfd6d task.ti: 880bfd6d
>RIP: 0010:[]  [] 
> init_overlap_sched_group+0xae/0x120
>RSP: :880bfd6d3e08  EFLAGS: 00010246
>RAX: 0100 RBX: 880bfcc69650 RCX:

3.16.49 Oops, does not boot on two socket server

2017-10-18 Thread Holger Kiehl

Hello,

just tried to boot 3.16.49 on a 2 socket server and it fails with the
following error:

   smpboot: Total of 24 processors activated (95818.36 BogoMIPS)
   [ cut here ]
   WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5811 
init_overlap_sched_group+0x114/0x120()
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.49-1.el6.x86_64 #1
   Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
 880bfd6d3da8 81542f1c 
16b3 880bfd6d3de8 8104cd72 880c0f803c00
880bfcc69650 8817fd695ca8 880bfd6e2300 
   Call Trace:
[] dump_stack+0x4e/0x6a
[] warn_slowpath_common+0x82/0xb0
[] warn_slowpath_null+0x15/0x20
[] init_overlap_sched_group+0x114/0x120
[] build_overlap_sched_groups+0x134/0x1e0
[] build_sched_domains+0x159/0x330
[] sched_init_smp+0x65/0xf8
[] kernel_init_freeable+0xb2/0x12d
[] ? rest_init+0x80/0x80
[] kernel_init+0x9/0xf0
[] ret_from_fork+0x58/0x90
[] ? rest_init+0x80/0x80
   ---[ end trace a491a27c866dd06e ]---
   BUG: unable to handle kernel paging request at 010247bf
   IP: [] init_overlap_sched_group+0xae/0x120
   PGD 0
   Oops:  [#1] SMP
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.16.49-1.el6.x86_64 #1
   Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
   task: 8817fd6a8000 ti: 880bfd6d task.ti: 880bfd6d
   RIP: 0010:[]  [] 
init_overlap_sched_group+0xae/0x120
   RSP: :880bfd6d3e08  EFLAGS: 00010246
   RAX: 0100 RBX: 880bfcc69650 RCX: 0020
   RDX: 000147c0 RSI: 0020 RDI: 0020
   RBP: 880bfd6d3e28 R08: 880bfd6e2318 R09: 
   R10: 0002 R11: 0001 R12: 8817fd695ca8
   R13: 880bfd6e2300 R14:  R15: 8817fd695ca8
   FS:  () GS:880c0fc0() knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 010247bf CR3: 001714000 CR4: 000407f0
   Stack:
   880bfcc69650
880bfd6d3ea8 81079974 0011 880bfd6e2300
  cac8 
   Call Trace:
[] build_overlap_sched_groups+0x134/0x1e0
[] build_sched_domains+0x159/0x330
[] sched_init_smp+0x65/0xf8
[] kernel_init_freeable+0xb2/0x12d
[] ? rest_init+0x80/0x80
[] kernel_init+0x9/0xf0
[] ret_from_fork+0x58/0x90
[] ? rest_init+0x80/0x80
   Code: 61 83 00 85 c0 74 70 49 8d 75 18 48 c7 c2 38 f9 8a 81 bf ff ff ff ff 
e8 51 fa 1f 00 49 8b 54 24 10 48 98 48 8b 04 c5 a0 fc 78 81 <48> 8b 14 10 b8 01 
00 00 00 49 89 55 10 f0 0f c1 02 85 c0 75 0f
   RIP  [] init_overlap_sched_group+0xae/0x120
RSP 
   CR2: 010247bf
   ---[ end trace a491a27c866dd06f ]---
   Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009

   Rebooting in 5 seconds..

This happened on three different systems. On a similar system with just
one CPU in a socket it boots fine. The last Kernel of this series I tried
was 2.16.48 and that worked fine.

Any idea what is wrong? In case it is useful I have attached my kernel
config.

Regards,
Holger#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 3.16.49 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not

3.16.49 Oops, does not boot on two socket server

2017-10-18 Thread Holger Kiehl

Hello,

just tried to boot 3.16.49 on a 2 socket server and it fails with the
following error:

   smpboot: Total of 24 processors activated (95818.36 BogoMIPS)
   [ cut here ]
   WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5811 
init_overlap_sched_group+0x114/0x120()
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.49-1.el6.x86_64 #1
   Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
 880bfd6d3da8 81542f1c 
16b3 880bfd6d3de8 8104cd72 880c0f803c00
880bfcc69650 8817fd695ca8 880bfd6e2300 
   Call Trace:
[] dump_stack+0x4e/0x6a
[] warn_slowpath_common+0x82/0xb0
[] warn_slowpath_null+0x15/0x20
[] init_overlap_sched_group+0x114/0x120
[] build_overlap_sched_groups+0x134/0x1e0
[] build_sched_domains+0x159/0x330
[] sched_init_smp+0x65/0xf8
[] kernel_init_freeable+0xb2/0x12d
[] ? rest_init+0x80/0x80
[] kernel_init+0x9/0xf0
[] ret_from_fork+0x58/0x90
[] ? rest_init+0x80/0x80
   ---[ end trace a491a27c866dd06e ]---
   BUG: unable to handle kernel paging request at 010247bf
   IP: [] init_overlap_sched_group+0xae/0x120
   PGD 0
   Oops:  [#1] SMP
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 3.16.49-1.el6.x86_64 #1
   Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
   task: 8817fd6a8000 ti: 880bfd6d task.ti: 880bfd6d
   RIP: 0010:[]  [] 
init_overlap_sched_group+0xae/0x120
   RSP: :880bfd6d3e08  EFLAGS: 00010246
   RAX: 0100 RBX: 880bfcc69650 RCX: 0020
   RDX: 000147c0 RSI: 0020 RDI: 0020
   RBP: 880bfd6d3e28 R08: 880bfd6e2318 R09: 
   R10: 0002 R11: 0001 R12: 8817fd695ca8
   R13: 880bfd6e2300 R14:  R15: 8817fd695ca8
   FS:  () GS:880c0fc0() knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 010247bf CR3: 001714000 CR4: 000407f0
   Stack:
   880bfcc69650
880bfd6d3ea8 81079974 0011 880bfd6e2300
  cac8 
   Call Trace:
[] build_overlap_sched_groups+0x134/0x1e0
[] build_sched_domains+0x159/0x330
[] sched_init_smp+0x65/0xf8
[] kernel_init_freeable+0xb2/0x12d
[] ? rest_init+0x80/0x80
[] kernel_init+0x9/0xf0
[] ret_from_fork+0x58/0x90
[] ? rest_init+0x80/0x80
   Code: 61 83 00 85 c0 74 70 49 8d 75 18 48 c7 c2 38 f9 8a 81 bf ff ff ff ff 
e8 51 fa 1f 00 49 8b 54 24 10 48 98 48 8b 04 c5 a0 fc 78 81 <48> 8b 14 10 b8 01 
00 00 00 49 89 55 10 f0 0f c1 02 85 c0 75 0f
   RIP  [] init_overlap_sched_group+0xae/0x120
RSP 
   CR2: 010247bf
   ---[ end trace a491a27c866dd06f ]---
   Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009

   Rebooting in 5 seconds..

This happened on three different systems. On a similar system with just
one CPU in a socket it boots fine. The last Kernel of this series I tried
was 2.16.48 and that worked fine.

Any idea what is wrong? In case it is useful I have attached my kernel
config.

Regards,
Holger#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 3.16.49 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not

Re: [PATCH] MD: make bio mergeable

2016-04-29 Thread Holger Kiehl

On Thu, 28 Apr 2016, Shaohua Li wrote:

> On Thu, Apr 28, 2016 at 08:00:22PM +0000, Holger Kiehl wrote:
> > Hello,
> > 
> > On Mon, 25 Apr 2016, Shaohua Li wrote:
> > 
> > > blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> > > But if dispatching the bio to underlayer disk, the blk_queue_split
> > > checks are invalid, hence it's possible the bio becomes mergeable.
> > > 
> > > In the reported bug, this bug causes trim against raid0 performance slash
> > > https://bugzilla.kernel.org/show_bug.cgi?id=117051
> > > 
> > This patch makes a huge difference. On a system with two Samsung 850 Pro
> > in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!
> > 
> > However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
> > also setup as one big MD Raid0, the patch does not make any difference
> > at all. fstrim takes more then 4 hours!
> 
> Does the raid0 cross two partitions or two SSD?
> 
Two SSD's. Where it works, for the two Samsung 850 Pro SATA SSD it was
via partitions.

> can you post blktrace data in the bugzilloa, I'll track the bug there.
> 
I did the blktrace on the two md raid0 devices /dev/nvme[01]n1 for 2 minutes
and attached them to the bug 117051 as a tar.bz2 file:

   https://bugzilla.kernel.org/show_bug.cgi?id=117051

Please just ask if I have forgotten anything. And many thanks for looking
at this and all the good work!

Regards,
Holger

Re: [PATCH] MD: make bio mergeable

2016-04-29 Thread Holger Kiehl

On Thu, 28 Apr 2016, Shaohua Li wrote:

> On Thu, Apr 28, 2016 at 08:00:22PM +0000, Holger Kiehl wrote:
> > Hello,
> > 
> > On Mon, 25 Apr 2016, Shaohua Li wrote:
> > 
> > > blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> > > But if dispatching the bio to underlayer disk, the blk_queue_split
> > > checks are invalid, hence it's possible the bio becomes mergeable.
> > > 
> > > In the reported bug, this bug causes trim against raid0 performance slash
> > > https://bugzilla.kernel.org/show_bug.cgi?id=117051
> > > 
> > This patch makes a huge difference. On a system with two Samsung 850 Pro
> > in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!
> > 
> > However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
> > also setup as one big MD Raid0, the patch does not make any difference
> > at all. fstrim takes more then 4 hours!
> 
> Does the raid0 cross two partitions or two SSD?
> 
Two SSD's. Where it works, for the two Samsung 850 Pro SATA SSD it was
via partitions.

> can you post blktrace data in the bugzilloa, I'll track the bug there.
> 
I did the blktrace on the two md raid0 devices /dev/nvme[01]n1 for 2 minutes
and attached them to the bug 117051 as a tar.bz2 file:

   https://bugzilla.kernel.org/show_bug.cgi?id=117051

Please just ask if I have forgotten anything. And many thanks for looking
at this and all the good work!

Regards,
Holger

Re: [PATCH] MD: make bio mergeable

2016-04-28 Thread Holger Kiehl

Hello,

On Mon, 25 Apr 2016, Shaohua Li wrote:

> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> But if dispatching the bio to underlayer disk, the blk_queue_split
> checks are invalid, hence it's possible the bio becomes mergeable.
> 
> In the reported bug, this bug causes trim against raid0 performance slash
> https://bugzilla.kernel.org/show_bug.cgi?id=117051
> 
This patch makes a huge difference. On a system with two Samsung 850 Pro
in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!

However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
also setup as one big MD Raid0, the patch does not make any difference
at all. fstrim takes more then 4 hours!

Any idea what could be wrong?

Regards,
Holger


> Reported-by: Park Ju Hyung 
> Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio)
> Cc: sta...@vger.kernel.org (v4.3+)
> Cc: Ming Lei 
> Cc: Jens Axboe 
> Cc: Neil Brown 
> Signed-off-by: Shaohua Li 
> ---
>  drivers/md/md.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 194580f..14d3b37 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -284,6 +284,8 @@ static blk_qc_t md_make_request(struct request_queue *q, 
> struct bio *bio)
>* go away inside make_request
>*/
>   sectors = bio_sectors(bio);
> + /* bio could be mergeable after passing to underlayer */
> + bio->bi_rw &= ~REQ_NOMERGE;
>   mddev->pers->make_request(mddev, bio);
>  
>   cpu = part_stat_lock();
> -- 
> 2.8.0.rc2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH] MD: make bio mergeable

2016-04-28 Thread Holger Kiehl

Hello,

On Mon, 25 Apr 2016, Shaohua Li wrote:

> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> But if dispatching the bio to underlayer disk, the blk_queue_split
> checks are invalid, hence it's possible the bio becomes mergeable.
> 
> In the reported bug, this bug causes trim against raid0 performance slash
> https://bugzilla.kernel.org/show_bug.cgi?id=117051
> 
This patch makes a huge difference. On a system with two Samsung 850 Pro
in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!

However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
also setup as one big MD Raid0, the patch does not make any difference
at all. fstrim takes more then 4 hours!

Any idea what could be wrong?

Regards,
Holger


> Reported-by: Park Ju Hyung 
> Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio)
> Cc: sta...@vger.kernel.org (v4.3+)
> Cc: Ming Lei 
> Cc: Jens Axboe 
> Cc: Neil Brown 
> Signed-off-by: Shaohua Li 
> ---
>  drivers/md/md.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 194580f..14d3b37 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -284,6 +284,8 @@ static blk_qc_t md_make_request(struct request_queue *q, 
> struct bio *bio)
>* go away inside make_request
>*/
>   sectors = bio_sectors(bio);
> + /* bio could be mergeable after passing to underlayer */
> + bio->bi_rw &= ~REQ_NOMERGE;
>   mddev->pers->make_request(mddev, bio);
>  
>   cpu = part_stat_lock();
> -- 
> 2.8.0.rc2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard

2015-05-22 Thread Holger Kiehl

On Thu, 21 May 2015, NeilBrown wrote:

On Thu, 21 May 2015 06:44:27 + (UTC) Holger Kiehl
wrote:

On Thu, 21 May 2015, NeilBrown wrote:

On Thu, 21 May 2015 01:32:13 +0500 Roman Mamedov wrote:

On Wed, 20 May 2015 20:12:31 + (UTC)
Holger Kiehl wrote:

The kernel I was running when I discovered the
problem was 4.0.2 from kernel.org. However, after reinstalling from DVD
I updated to Fedora's lattest kernel, which was 3.19.? (I do not remember
the last numbers). So that kernel seems also effected, but I assume it
contains many 'fixes' from 4.0.x. As filesystem I use ext4, distribution
is Fedora 21 and hardware is: Xeon E3-1275, 16GB ECC Ram.

My system seems to be now running stable for some days with kernel.org
kernel 4.0.3 and with discard DISABLED. But I am still unsure what could
be the real cause.

It is a bug in the 4.0.2 kernel, fixed in 4.0.3.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785672
https://bbs.archlinux.org/viewtopic.php?id=197400
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/d2dc317d564a46dfc683978a2e5a4f91434e9711

I suspect that is a different bug.
I think this one is
https://bugzilla.kernel.org/show_bug.cgi?id=98501

Should there not be a big fat warning going around telling users to disable
discard on Raid 0 until this is fixed? This breaks the filesystem completely
and I believe there is absolutly no way one can get back the data.

Probably. Would you like to do that?

Is this fixed in 4.0.4? And which kernels are effected? There could be many
people running systems that have not noticed this and don't know in what
dangerous situation they are when they delete data.

The patch was only added to my tree today. I will send to Linus tomorrow so
it should appear in the next -rc.
Any -stable kernel released since mid-April probably has the bug. It was
caused by
commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd

Once the fix gets into Linus' tree, it should get into subsequent -stable
releases.

The fix is here:

http://git.neil.brown.name/?p=md.git;a=commitdiff;h=a81157768a00e8cf8a7b43b5ea5cac931262374f

commit id should remain unchanged.

I would like to confirm that with this patch and discard enabled, I no longer
see any corruption.

Many thanks for the quick fix!

Regards,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard

2015-05-22 Thread Holger Kiehl

On Thu, 21 May 2015, NeilBrown wrote:

On Thu, 21 May 2015 06:44:27 + (UTC) Holger Kiehl holger.ki...@dwd.de
wrote:

On Thu, 21 May 2015, NeilBrown wrote:

On Thu, 21 May 2015 01:32:13 +0500 Roman Mamedov r...@romanrm.net wrote:

On Wed, 20 May 2015 20:12:31 + (UTC)
Holger Kiehl holger.ki...@dwd.de wrote:

My system seems to be now running stable for some days with kernel.org
kernel 4.0.3 and with discard DISABLED. But I am still unsure what could
be the real cause.

It is a bug in the 4.0.2 kernel, fixed in 4.0.3.

I suspect that is a different bug.
I think this one is
https://bugzilla.kernel.org/show_bug.cgi?id=98501

Probably. Would you like to do that?

Once the fix gets into Linus' tree, it should get into subsequent -stable
releases.

The fix is here:

http://git.neil.brown.name/?p=md.git;a=commitdiff;h=a81157768a00e8cf8a7b43b5ea5cac931262374f

commit id should remain unchanged.

I would like to confirm that with this patch and discard enabled, I no longer
see any corruption.

Many thanks for the quick fix!

Regards,
Holger
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

WARNING: Software Raid 0 on SSD's and discard corrupts data

2015-05-21 Thread Holger Kiehl


Hello,

all users using a Software Raid 0 on SSD's with discard should disable
discard, if they use any recent kernel since mid-April 2015. The bug
was introduced by commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd and
the fix is not yet in Linus tree. The fix can be found here:

   
http://git.neil.brown.name/?p=md.git;a=commitdiff;h=a81157768a00e8cf8a7b43b5ea5cac931262374f

Users should immediately remove the discard option from any mounted
software Raid 0 filesystems. Any delete or modification of files can
lead to random destruction on the filesystem. Use the remount option
of the mount command to remove the discard option. Do not do it via
editing /etc/fstab if your root filesystem is on a software raid 0.

Regards,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard

2015-05-21 Thread Holger Kiehl

On Thu, 21 May 2015, NeilBrown wrote:

On Thu, 21 May 2015 01:32:13 +0500 Roman Mamedov wrote:

On Wed, 20 May 2015 20:12:31 + (UTC)
Holger Kiehl wrote:

My system seems to be now running stable for some days with kernel.org
kernel 4.0.3 and with discard DISABLED. But I am still unsure what could
be the real cause.

It is a bug in the 4.0.2 kernel, fixed in 4.0.3.

I suspect that is a different bug.
I think this one is
https://bugzilla.kernel.org/show_bug.cgi?id=98501

WARNING: Software Raid 0 on SSD's and discard corrupts data

2015-05-21 Thread Holger Kiehl


Hello,

all users using a Software Raid 0 on SSD's with discard should disable
discard, if they use any recent kernel since mid-April 2015. The bug
was introduced by commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd and
the fix is not yet in Linus tree. The fix can be found here:

   
http://git.neil.brown.name/?p=md.git;a=commitdiff;h=a81157768a00e8cf8a7b43b5ea5cac931262374f

Users should immediately remove the discard option from any mounted
software Raid 0 filesystems. Any delete or modification of files can
lead to random destruction on the filesystem. Use the remount option
of the mount command to remove the discard option. Do not do it via
editing /etc/fstab if your root filesystem is on a software raid 0.

Regards,
Holger
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard

2015-05-21 Thread Holger Kiehl

On Thu, 21 May 2015, NeilBrown wrote:

On Thu, 21 May 2015 01:32:13 +0500 Roman Mamedov r...@romanrm.net wrote:

On Wed, 20 May 2015 20:12:31 + (UTC)
Holger Kiehl holger.ki...@dwd.de wrote:

My system seems to be now running stable for some days with kernel.org
kernel 4.0.3 and with discard DISABLED. But I am still unsure what could
be the real cause.

It is a bug in the 4.0.2 kernel, fixed in 4.0.3.

I suspect that is a different bug.
I think this one is
https://bugzilla.kernel.org/show_bug.cgi?id=98501

Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard

2015-05-20 Thread Holger Kiehl


Hello,

I had a terrible weekend recovering my home system. Always when files
where deleted some data got corrupted. At first I did not notice it,
but when I rebooted the system would not come up again, systemd crashed
with SIGSEGV and that was it. Booting from an USB stick I saw that
some glibc lib had a different size from that in the original RPM. So
all I did reinstalled that lib from USB stick and everything was fine
after rebooting from Raid 0. But I then wanted to make sure that
no other files where corrupted so I checked and found more. So again I
reinstalled those RPM's and rebooted. To my big surprise the system was
again broken and failed to boot. I again tried to recover my system
from USB stick, but this time did not manage to recover the system. So
decided to reinstall the system completely from DVD. Everything looked good
until that moment when I had activated the discard option in /etc/fstab.
After doing some more work (adding and removing things) I rebooted and
again the system failed to boot. Booting from the USB stick I saw that
the /etc/fstab was all filled with NULL's. This gave me the clue that
there must be some problem with discard (trim). My system is using
a software raid 0 IMSM (intel 'fake' raid) on two Samsung SSD 840 pro.

A window system on the same disks (that is why I am using IMSM raid)
was not effected by this problem. I have checked the ram with memtest86
and everything is ok. The kernel I was running when I discovered the
problem was 4.0.2 from kernel.org. However, after reinstalling from DVD
I updated to Fedora's lattest kernel, which was 3.19.? (I do not remember
the last numbers). So that kernel seems also effected, but I assume it
contains many 'fixes' from 4.0.x. As filesystem I use ext4, distribution
is Fedora 21 and hardware is: Xeon E3-1275, 16GB ECC Ram.

My system seems to be now running stable for some days with kernel.org
kernel 4.0.3 and with discard DISABLED. But I am still unsure what could
be the real cause.

Regards,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Filesystem corruption MD (imsm) Raid0 via 2 SSD's + discard

2015-05-20 Thread Holger Kiehl


Hello,

I had a terrible weekend recovering my home system. Always when files
where deleted some data got corrupted. At first I did not notice it,
but when I rebooted the system would not come up again, systemd crashed
with SIGSEGV and that was it. Booting from an USB stick I saw that
some glibc lib had a different size from that in the original RPM. So
all I did reinstalled that lib from USB stick and everything was fine
after rebooting from Raid 0. But I then wanted to make sure that
no other files where corrupted so I checked and found more. So again I
reinstalled those RPM's and rebooted. To my big surprise the system was
again broken and failed to boot. I again tried to recover my system
from USB stick, but this time did not manage to recover the system. So
decided to reinstall the system completely from DVD. Everything looked good
until that moment when I had activated the discard option in /etc/fstab.
After doing some more work (adding and removing things) I rebooted and
again the system failed to boot. Booting from the USB stick I saw that
the /etc/fstab was all filled with NULL's. This gave me the clue that
there must be some problem with discard (trim). My system is using
a software raid 0 IMSM (intel 'fake' raid) on two Samsung SSD 840 pro.

A window system on the same disks (that is why I am using IMSM raid)
was not effected by this problem. I have checked the ram with memtest86
and everything is ok. The kernel I was running when I discovered the
problem was 4.0.2 from kernel.org. However, after reinstalling from DVD
I updated to Fedora's lattest kernel, which was 3.19.? (I do not remember
the last numbers). So that kernel seems also effected, but I assume it
contains many 'fixes' from 4.0.x. As filesystem I use ext4, distribution
is Fedora 21 and hardware is: Xeon E3-1275, 16GB ECC Ram.

My system seems to be now running stable for some days with kernel.org
kernel 4.0.3 and with discard DISABLED. But I am still unsure what could
be the real cause.

Regards,
Holger
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

qlcnic very high TX values, as of 3.13.x

2014-07-11 Thread Holger Kiehl


Hello,

upgrading from 3.10.x to the next stable series 3.14.x I noticed that
ifconfig reports very high TX values. Taking the qlcnic source from
3.15.5 and compile it under 3.14.12, the problem remains. Going
backwards always just copying the qlcnic source from the older kernels
to the 3.14.12 tree, I noticed that the 3.12.x kernel was the last
version that does not generate those high TX values. So the problem
started with the qlcnic driver in 3.13.x. However, comparing 3.13.x
and 3.14.x the numbers go higher in 3.14.x much quicker. In 3.14.x
I get TX values in Terabytes very quickly after boot. I once even got
Petabyte values!

Hardware is the following:

HP ProLiant DL380 G7
2 x Intel Xeon X5690 (24 cores with hypertreading)
106 GByte Ram
1 x NC523SFP 10Gb 2-port Server Adapter Board Chip rev 0x54 (qlcnic)
1 x Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (ixgbe)

The qlcnic and ixgbe cards are bonded together in fault-tolerance
(active-backup) mode. And even when I switch to the Intel card, after I
get crazy TX values on qlcnic card, the TX vaules on this card still
go up at a very quick rate. This only stops when I reset the card
(reload the module). Also, there is no differnce if I compile the driver
in or use it as module. There are no strange messages in
/var/log/messages or dmesg. Here the output with the 3.13.x driver in
3.14.12 when system boots:

   [   18.229195] QLogic 1/10 GbE Converged/Intelligent Ethernet Driver v5.3.52
   [   18.229415] qlcnic :1a:00.0: 2048KB memory map
   [   18.854134] qlcnic :1a:00.0: Default minidump capture mask 0x1f
   [   19.602491] qlcnic :1a:00.0: FW dump enabled
   [   19.631257] qlcnic :1a:00.0: Supports FW dump capability
   [   19.667072] qlcnic :1a:00.0: Driver v5.3.52, firmware v4.14.26
   [   19.704279] qlcnic :1a:00.0: Set 4 Tx rings
   [   19.733001] qlcnic :1a:00.0: Set 4 SDS rings
   [   19.898808] qlcnic: 2c:27:d7:50:04:48: NC523SFP 10Gb 2-port Server 
Adapter Board Chip rev 0x54
   [   19.949325] qlcnic :1a:00.0: irq 129 for MSI/MSI-X
   [   19.949329] qlcnic :1a:00.0: irq 130 for MSI/MSI-X
   [   19.949333] qlcnic :1a:00.0: irq 131 for MSI/MSI-X
   [   19.949336] qlcnic :1a:00.0: irq 132 for MSI/MSI-X
   [   19.949340] qlcnic :1a:00.0: irq 133 for MSI/MSI-X
   [   19.949343] qlcnic :1a:00.0: irq 134 for MSI/MSI-X
   [   19.949347] qlcnic :1a:00.0: irq 135 for MSI/MSI-X
   [   19.949350] qlcnic :1a:00.0: irq 136 for MSI/MSI-X
   [   19.949369] qlcnic :1a:00.0: using msi-x interrupts
   [   19.982782] qlcnic :1a:00.0: Set 4 Tx queues
   [   20.055099] qlcnic :1a:00.0: eth2: XGbE port initialized
   [   20.090408] qlcnic :1a:00.1: 2048KB memory map
   [   20.179836] qlcnic :1a:00.1: Default minidump capture mask 0x1f
   [   20.217848] qlcnic :1a:00.1: FW dump enabled
   [   20.246979] qlcnic :1a:00.1: Supports FW dump capability
   [   20.282318] qlcnic :1a:00.1: Driver v5.3.52, firmware v4.14.26
   [   20.320238] qlcnic :1a:00.1: Set 4 Tx rings
   [   20.350038] qlcnic :1a:00.1: Set 4 SDS rings
   [   20.429714] qlcnic :1a:00.1: irq 137 for MSI/MSI-X
   [   20.429718] qlcnic :1a:00.1: irq 138 for MSI/MSI-X
   [   20.429722] qlcnic :1a:00.1: irq 139 for MSI/MSI-X
   [   20.429726] qlcnic :1a:00.1: irq 140 for MSI/MSI-X
   [   20.429729] qlcnic :1a:00.1: irq 141 for MSI/MSI-X
   [   20.429732] qlcnic :1a:00.1: irq 142 for MSI/MSI-X
   [   20.429736] qlcnic :1a:00.1: irq 143 for MSI/MSI-X
   [   20.429739] qlcnic :1a:00.1: irq 144 for MSI/MSI-X
   [   20.429757] qlcnic :1a:00.1: using msi-x interrupts
   [   20.458895] qlcnic :1a:00.1: Set 4 Tx queues
   [   20.486907] qlcnic :1a:00.1: eth3: XGbE port initialized

My kernel config can be downloaded here:

   ftp://ftp.dwd.de/pub/afd/test/.config

Please, just ask if I need to provide more details and please CC me,
since I am not on the list.

Thanks,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

qlcnic very high TX values, as of 3.13.x

2014-07-11 Thread Holger Kiehl


Hello,

upgrading from 3.10.x to the next stable series 3.14.x I noticed that
ifconfig reports very high TX values. Taking the qlcnic source from
3.15.5 and compile it under 3.14.12, the problem remains. Going
backwards always just copying the qlcnic source from the older kernels
to the 3.14.12 tree, I noticed that the 3.12.x kernel was the last
version that does not generate those high TX values. So the problem
started with the qlcnic driver in 3.13.x. However, comparing 3.13.x
and 3.14.x the numbers go higher in 3.14.x much quicker. In 3.14.x
I get TX values in Terabytes very quickly after boot. I once even got
Petabyte values!

Hardware is the following:

HP ProLiant DL380 G7
2 x Intel Xeon X5690 (24 cores with hypertreading)
106 GByte Ram
1 x NC523SFP 10Gb 2-port Server Adapter Board Chip rev 0x54 (qlcnic)
1 x Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (ixgbe)

The qlcnic and ixgbe cards are bonded together in fault-tolerance
(active-backup) mode. And even when I switch to the Intel card, after I
get crazy TX values on qlcnic card, the TX vaules on this card still
go up at a very quick rate. This only stops when I reset the card
(reload the module). Also, there is no differnce if I compile the driver
in or use it as module. There are no strange messages in
/var/log/messages or dmesg. Here the output with the 3.13.x driver in
3.14.12 when system boots:

   [   18.229195] QLogic 1/10 GbE Converged/Intelligent Ethernet Driver v5.3.52
   [   18.229415] qlcnic :1a:00.0: 2048KB memory map
   [   18.854134] qlcnic :1a:00.0: Default minidump capture mask 0x1f
   [   19.602491] qlcnic :1a:00.0: FW dump enabled
   [   19.631257] qlcnic :1a:00.0: Supports FW dump capability
   [   19.667072] qlcnic :1a:00.0: Driver v5.3.52, firmware v4.14.26
   [   19.704279] qlcnic :1a:00.0: Set 4 Tx rings
   [   19.733001] qlcnic :1a:00.0: Set 4 SDS rings
   [   19.898808] qlcnic: 2c:27:d7:50:04:48: NC523SFP 10Gb 2-port Server 
Adapter Board Chip rev 0x54
   [   19.949325] qlcnic :1a:00.0: irq 129 for MSI/MSI-X
   [   19.949329] qlcnic :1a:00.0: irq 130 for MSI/MSI-X
   [   19.949333] qlcnic :1a:00.0: irq 131 for MSI/MSI-X
   [   19.949336] qlcnic :1a:00.0: irq 132 for MSI/MSI-X
   [   19.949340] qlcnic :1a:00.0: irq 133 for MSI/MSI-X
   [   19.949343] qlcnic :1a:00.0: irq 134 for MSI/MSI-X
   [   19.949347] qlcnic :1a:00.0: irq 135 for MSI/MSI-X
   [   19.949350] qlcnic :1a:00.0: irq 136 for MSI/MSI-X
   [   19.949369] qlcnic :1a:00.0: using msi-x interrupts
   [   19.982782] qlcnic :1a:00.0: Set 4 Tx queues
   [   20.055099] qlcnic :1a:00.0: eth2: XGbE port initialized
   [   20.090408] qlcnic :1a:00.1: 2048KB memory map
   [   20.179836] qlcnic :1a:00.1: Default minidump capture mask 0x1f
   [   20.217848] qlcnic :1a:00.1: FW dump enabled
   [   20.246979] qlcnic :1a:00.1: Supports FW dump capability
   [   20.282318] qlcnic :1a:00.1: Driver v5.3.52, firmware v4.14.26
   [   20.320238] qlcnic :1a:00.1: Set 4 Tx rings
   [   20.350038] qlcnic :1a:00.1: Set 4 SDS rings
   [   20.429714] qlcnic :1a:00.1: irq 137 for MSI/MSI-X
   [   20.429718] qlcnic :1a:00.1: irq 138 for MSI/MSI-X
   [   20.429722] qlcnic :1a:00.1: irq 139 for MSI/MSI-X
   [   20.429726] qlcnic :1a:00.1: irq 140 for MSI/MSI-X
   [   20.429729] qlcnic :1a:00.1: irq 141 for MSI/MSI-X
   [   20.429732] qlcnic :1a:00.1: irq 142 for MSI/MSI-X
   [   20.429736] qlcnic :1a:00.1: irq 143 for MSI/MSI-X
   [   20.429739] qlcnic :1a:00.1: irq 144 for MSI/MSI-X
   [   20.429757] qlcnic :1a:00.1: using msi-x interrupts
   [   20.458895] qlcnic :1a:00.1: Set 4 Tx queues
   [   20.486907] qlcnic :1a:00.1: eth3: XGbE port initialized

My kernel config can be downloaded here:

   ftp://ftp.dwd.de/pub/afd/test/.config

Please, just ask if I need to provide more details and please CC me,
since I am not on the list.

Thanks,
Holger
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Kernel panic with 3.10.33 and possible hpwdt watchdog

2014-03-18 Thread Holger Kiehl


Hello,

I use a plain kernel.org kernel 3.10.33 and when I do a HP ILO (proprietary
embedded server management technology) reset of my Proliant 380p server,
the system hangs. Unfortunatly I cannot do a serial trace, so copied
everything by hand what I could read from console:

   [] ? vga_set_palette+0xd1/0x130
   [] ? panic+0x18c/0x1c7
   [] ? panic+0xf4/0x1c7
   [] ? hpwdt_pretimeout+0xc5/0xd0 [hpwdt]
   [] ? nmi_handle+0x59/0x80
   [] ? default_do_nmi+0x12f/0x2a0
   [] ? do_nmi+0x88/0xd0
   [] ? end_repeat_nmi+0x1e/0x2e
   [] ? intel_idle+0xb6/0x120
   [] ? intel_idle+0xb6/0x120
   [] ? intel_idle+0xb6/0x120
   <>  [] ? cpuidle_enter_state+0x3d/0xd0
   [] ? cpuidle_idle_call+0xba/0x140
   [] ? __tick_nohz_idle_enter+0x8d/0x120
   [] ? arch_cpu_idle+0x9/0x30
   [] ? cpu_idle_loop+0x92/0x160
   [] ? cpu_startup_entry+0x6b/0x70
   [] ? start_kernel+0x3e2/0x3ed
   [] ? repair_env_string+0x5e/0x5e
   [] ? x86_64_start_kernel+0x12a/0x130
   ---[ end trace 2a7f5aee76758ec0 ]---
   dmar: DRHD: handling fault status reg 2
   dmar: DMAR:[DMA Read] Request device [01:00.2] fault addr e9000
   DMAR:[fault reason 06] PTE Read access is not set

If I remove the hpwdt driver and I then reset the HP ILO system, the
system also hangs, but continuously at an interval of aprrox. 2 seconds
writes the following to console:

   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.

Also, setting nmi_watchdog=0 does not change anything.

This does not happen when I do take the default kernel of the
disrtibution (Scientific Linux 6.5) 2.6.32-431.5.1.el6.x86_64.

The bad thing is that when the hpwdt driver is loaded, the watchdog does
not reset the system, ie. it hangs forever. And I cannot use Intel TCO
WatchDog Timer Driver since it is disabled in bios.

Please, can someone give me a hint where the error could be and what I
can do so I can continue to use the kernel.org kernel.

Many thanks in advance,
Holger

PS: Please CC me since I am not subscribed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Kernel panic with 3.10.33 and possible hpwdt watchdog

2014-03-18 Thread Holger Kiehl


Hello,

I use a plain kernel.org kernel 3.10.33 and when I do a HP ILO (proprietary
embedded server management technology) reset of my Proliant 380p server,
the system hangs. Unfortunatly I cannot do a serial trace, so copied
everything by hand what I could read from console:

   EOI  NMI  [812898c1] ? vga_set_palette+0xd1/0x130
   [8155e4b0] ? panic+0x18c/0x1c7
   [8155e418] ? panic+0xf4/0x1c7
   [a002c885] ? hpwdt_pretimeout+0xc5/0xd0 [hpwdt]
   [81006389] ? nmi_handle+0x59/0x80
   [8100650f] ? default_do_nmi+0x12f/0x2a0
   [81006708] ? do_nmi+0x88/0xd0
   [81561ff7] ? end_repeat_nmi+0x1e/0x2e
   [81298e16] ? intel_idle+0xb6/0x120
   [81298e16] ? intel_idle+0xb6/0x120
   [81298e16] ? intel_idle+0xb6/0x120
   EOE  [8146213d] ? cpuidle_enter_state+0x3d/0xd0
   [814624fa] ? cpuidle_idle_call+0xba/0x140
   [81085a8d] ? __tick_nohz_idle_enter+0x8d/0x120
   [8100b669] ? arch_cpu_idle+0x9/0x30
   [8107c3e2] ? cpu_idle_loop+0x92/0x160
   [8107c51b] ? cpu_startup_entry+0x6b/0x70
   [817bafe3] ? start_kernel+0x3e2/0x3ed
   [817baa33] ? repair_env_string+0x5e/0x5e
   [817ba6bf] ? x86_64_start_kernel+0x12a/0x130
   ---[ end trace 2a7f5aee76758ec0 ]---
   dmar: DRHD: handling fault status reg 2
   dmar: DMAR:[DMA Read] Request device [01:00.2] fault addr e9000
   DMAR:[fault reason 06] PTE Read access is not set

If I remove the hpwdt driver and I then reset the HP ILO system, the
system also hangs, but continuously at an interval of aprrox. 2 seconds
writes the following to console:

   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
   NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.

Also, setting nmi_watchdog=0 does not change anything.

This does not happen when I do take the default kernel of the
disrtibution (Scientific Linux 6.5) 2.6.32-431.5.1.el6.x86_64.

The bad thing is that when the hpwdt driver is loaded, the watchdog does
not reset the system, ie. it hangs forever. And I cannot use Intel TCO
WatchDog Timer Driver since it is disabled in bios.

Please, can someone give me a hint where the error could be and what I
can do so I can continue to use the kernel.org kernel.

Many thanks in advance,
Holger

PS: Please CC me since I am not subscribed

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Need help in bug in isolate_migratepages_range

2014-02-03 Thread Holger Kiehl


On Mon, 3 Feb 2014, David Rientjes wrote:


On Mon, 3 Feb 2014, Vlastimil Babka wrote:


It seems to come from balloon_page_movable() and its test page_count(page) ==
1.



Hmm, I think it might be because compound_head() == NULL here.  Holger,
this looks like a race condition when allocating a compound page, did you
only see it once or is it actually reproducible?


No, this only happened once. It is not reproducable, the system was running
for four days without problems. And before this kernel, five years without
any problems.

Thanks,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Need help in bug in isolate_migratepages_range

2014-02-03 Thread Holger Kiehl


On Mon, 3 Feb 2014, Michal Hocko wrote:


On Mon 03-02-14 14:29:22, Holger Kiehl wrote:

I have attached it. Please, tell me if you do not get the attachment.


I hoped it would help me to get a closer compiled code to yours but I am
probably using too different gcc.


I have an old gcc, it is 4.4.1-2.


Anyway I've tried to check whether I can hook on something and it seems
that this is a race with thp merge/split or something like that.

[...]

  Jan 31 13:07:43 asterix kernel: BUG: unable to handle kernel NULL pointer 
dereference at 001c
  Jan 31 13:07:43 asterix kernel: IP: [] 
isolate_migratepages_range+0x32d/0x653
  Jan 31 13:07:43 asterix kernel: PGD 7d3074067 PUD 7d3073067 PMD 0
  Jan 31 13:07:43 asterix kernel: Oops:  [#1] SMP
  Jan 31 13:07:43 asterix kernel: Modules linked in: drbd lru_cache coretemp 
ipmi_devintf bonding nf_conntrack_ftp binfmt_misc usbhid i2c_i801 sg ehci_pci 
i2c_core ehci_hcd uhci_hcd i5000_edac i5k_amb ipmi_si ipmi_msghandler usbcore 
usb_common [last unloaded: microcode]
  Jan 31 13:07:43 asterix kernel: CPU: 5 PID: 14164 Comm: java Not tainted 
3.12.9 #1
  Jan 31 13:07:43 asterix kernel: Hardware name: FUJITSU SIEMENS PRIMERGY RX300 
S4 /D2519, BIOS 4.06  Rev. 1.04.2519 07/30/2008
  Jan 31 13:07:43 asterix kernel: task: 8807d30b08c0 ti: 8807d30b2000 
task.ti: 8807d30b2000
  Jan 31 13:07:43 asterix kernel: RIP: 0010:[]  
[] isolate_migratepages_range+0x32d/0x653
  Jan 31 13:07:43 asterix kernel: RSP: :8807d30b3928  EFLAGS: 00010286
  Jan 31 13:07:43 asterix kernel: RAX:  RBX: 0020ec09 
RCX: 0002
  Jan 31 13:07:43 asterix kernel: RDX: 2c008000 RSI: 0004 
RDI: 006c
  Jan 31 13:07:43 asterix kernel: RBP: 8807d30b39f8 R08: 88083fbde390 
R09: 0001
  Jan 31 13:07:43 asterix kernel: R10:  R11: ea000733a000 
R12: 8807d30b3a58
  Jan 31 13:07:43 asterix kernel: R13: ea000733a1f8 R14:  
R15: 88083ffe1d80
  Jan 31 13:07:43 asterix kernel: FS:  7f9d9e72f910() 
GS:88083fd4() knlGS:
  Jan 31 13:07:43 asterix kernel: CS:  0010 DS:  ES:  CR0: 
8005003b
  Jan 31 13:07:43 asterix kernel: CR2: 001c CR3: 0007d307 
CR4: 000407e0
  Jan 31 13:07:43 asterix kernel: Stack:
  Jan 31 13:07:43 asterix kernel: 0009 88083ffe16c0 
ea2e6af0 8807d30b3998
  Jan 31 13:07:43 asterix kernel: 8807d30b2010 00ff8807d30b08c0 
8807d30b08c0 0020f000
  Jan 31 13:07:43 asterix kernel:  083b 
000a 8807d30b3a68
  Jan 31 13:07:43 asterix kernel: Call Trace:
  Jan 31 13:07:43 asterix kernel: [] ? 
lru_add_drain_cpu+0x25/0x97
  Jan 31 13:07:43 asterix kernel: [] compact_zone+0x2b5/0x319
  Jan 31 13:07:43 asterix kernel: [] ? put_super+0x20/0x2c
  Jan 31 13:07:43 asterix kernel: [] 
compact_zone_order+0xad/0xc4
  Jan 31 13:07:43 asterix kernel: [] 
try_to_compact_pages+0x91/0xe8
  Jan 31 13:07:43 asterix kernel: [] ? 
page_alloc_cpu_notify+0x3e/0x3e
  Jan 31 13:07:43 asterix kernel: [] 
__alloc_pages_direct_compact+0xae/0x195
  Jan 31 13:07:43 asterix kernel: [] 
__alloc_pages_nodemask+0x772/0x7b5
  Jan 31 13:07:43 asterix kernel: [] 
alloc_pages_vma+0xd6/0x101
  Jan 31 13:07:43 asterix kernel: [] 
do_huge_pmd_anonymous_page+0x199/0x2ee
  Jan 31 13:07:43 asterix kernel: [] 
handle_mm_fault+0x1b7/0xceb
  Jan 31 13:07:43 asterix kernel: [] ? 
__dequeue_entity+0x2e/0x33
  Jan 31 13:07:43 asterix kernel: [] 
__do_page_fault+0x3bd/0x3e4
  Jan 31 13:07:43 asterix kernel: [] ? 
mprotect_fixup+0x1c9/0x1fb
  Jan 31 13:07:43 asterix kernel: [] ? vm_mmap_pgoff+0x6d/0x8f
  Jan 31 13:07:43 asterix kernel: [] ? SyS_futex+0x103/0x13d
  Jan 31 13:07:43 asterix kernel: [] do_page_fault+0x9/0xb
  Jan 31 13:07:43 asterix kernel: [] page_fault+0x22/0x30
  Jan 31 13:07:43 asterix kernel: Code: 00 41 f7 45 00 ff ff ff 01 0f 85 43 02 00 00 
41 8b 45 18 85 c0 0f 89 37 02 00 00 49 8b 55 00 4c 89 e8 66 85 d2 79 04 49 8b 45 30 
<8b> 40 1c 83 f8 01 0f 85 1b 02 00 00 49 8b 55 08 30 c0 48 85 d2
  Jan 31 13:07:43 asterix kernel: RIP  [] 
isolate_migratepages_range+0x32d/0x653
  Jan 31 13:07:43 asterix kernel: RSP 
  Jan 31 13:07:43 asterix kernel: CR2: 001c
  Jan 31 13:07:43 asterix kernel: ---[ end trace fba75c5b0b9175ea ]---


This seems to match:
  17027:   49 8b 17mov(%r15),%rdx   # page->flags
  1702a:   4c 89 f8mov%r15,%rax
  1702d:   80 e6 80and$0x80,%dh # PageTail test
  17030:   74 04   je 17036 

  17032:   49 8b 47 30 mov0x30(%r15),%rax   # page = 
page->first_page
  17036:   8b 40 1cmov0x1c(%rax),%eax   <<< page->_count
  17039:   ff c8   dec%eax

Which seems to be inlined comp

Re: Need help in bug in isolate_migratepages_range

2014-02-03 Thread Holger Kiehl


On Mon, 3 Feb 2014, Michal Hocko wrote:


On Mon 03-02-14 14:29:22, Holger Kiehl wrote:

I have attached it. Please, tell me if you do not get the attachment.


I hoped it would help me to get a closer compiled code to yours but I am
probably using too different gcc.


I have an old gcc, it is 4.4.1-2.


Anyway I've tried to check whether I can hook on something and it seems
that this is a race with thp merge/split or something like that.

[...]

  Jan 31 13:07:43 asterix kernel: BUG: unable to handle kernel NULL pointer 
dereference at 001c
  Jan 31 13:07:43 asterix kernel: IP: [810af0ac] 
isolate_migratepages_range+0x32d/0x653
  Jan 31 13:07:43 asterix kernel: PGD 7d3074067 PUD 7d3073067 PMD 0
  Jan 31 13:07:43 asterix kernel: Oops:  [#1] SMP
  Jan 31 13:07:43 asterix kernel: Modules linked in: drbd lru_cache coretemp 
ipmi_devintf bonding nf_conntrack_ftp binfmt_misc usbhid i2c_i801 sg ehci_pci 
i2c_core ehci_hcd uhci_hcd i5000_edac i5k_amb ipmi_si ipmi_msghandler usbcore 
usb_common [last unloaded: microcode]
  Jan 31 13:07:43 asterix kernel: CPU: 5 PID: 14164 Comm: java Not tainted 
3.12.9 #1
  Jan 31 13:07:43 asterix kernel: Hardware name: FUJITSU SIEMENS PRIMERGY RX300 
S4 /D2519, BIOS 4.06  Rev. 1.04.2519 07/30/2008
  Jan 31 13:07:43 asterix kernel: task: 8807d30b08c0 ti: 8807d30b2000 
task.ti: 8807d30b2000
  Jan 31 13:07:43 asterix kernel: RIP: 0010:[810af0ac]  
[810af0ac] isolate_migratepages_range+0x32d/0x653
  Jan 31 13:07:43 asterix kernel: RSP: :8807d30b3928  EFLAGS: 00010286
  Jan 31 13:07:43 asterix kernel: RAX:  RBX: 0020ec09 
RCX: 0002
  Jan 31 13:07:43 asterix kernel: RDX: 2c008000 RSI: 0004 
RDI: 006c
  Jan 31 13:07:43 asterix kernel: RBP: 8807d30b39f8 R08: 88083fbde390 
R09: 0001
  Jan 31 13:07:43 asterix kernel: R10:  R11: ea000733a000 
R12: 8807d30b3a58
  Jan 31 13:07:43 asterix kernel: R13: ea000733a1f8 R14:  
R15: 88083ffe1d80
  Jan 31 13:07:43 asterix kernel: FS:  7f9d9e72f910() 
GS:88083fd4() knlGS:
  Jan 31 13:07:43 asterix kernel: CS:  0010 DS:  ES:  CR0: 
8005003b
  Jan 31 13:07:43 asterix kernel: CR2: 001c CR3: 0007d307 
CR4: 000407e0
  Jan 31 13:07:43 asterix kernel: Stack:
  Jan 31 13:07:43 asterix kernel: 0009 88083ffe16c0 
ea2e6af0 8807d30b3998
  Jan 31 13:07:43 asterix kernel: 8807d30b2010 00ff8807d30b08c0 
8807d30b08c0 0020f000
  Jan 31 13:07:43 asterix kernel:  083b 
000a 8807d30b3a68
  Jan 31 13:07:43 asterix kernel: Call Trace:
  Jan 31 13:07:43 asterix kernel: [810a161f] ? 
lru_add_drain_cpu+0x25/0x97
  Jan 31 13:07:43 asterix kernel: [810af687] compact_zone+0x2b5/0x319
  Jan 31 13:07:43 asterix kernel: [810da586] ? put_super+0x20/0x2c
  Jan 31 13:07:43 asterix kernel: [810afa4d] 
compact_zone_order+0xad/0xc4
  Jan 31 13:07:43 asterix kernel: [810afaf5] 
try_to_compact_pages+0x91/0xe8
  Jan 31 13:07:43 asterix kernel: [8109b92d] ? 
page_alloc_cpu_notify+0x3e/0x3e
  Jan 31 13:07:43 asterix kernel: [8109da34] 
__alloc_pages_direct_compact+0xae/0x195
  Jan 31 13:07:43 asterix kernel: [8109e45d] 
__alloc_pages_nodemask+0x772/0x7b5
  Jan 31 13:07:43 asterix kernel: [810c85a3] 
alloc_pages_vma+0xd6/0x101
  Jan 31 13:07:43 asterix kernel: [810d47e3] 
do_huge_pmd_anonymous_page+0x199/0x2ee
  Jan 31 13:07:43 asterix kernel: [810b3884] 
handle_mm_fault+0x1b7/0xceb
  Jan 31 13:07:43 asterix kernel: [8105dedc] ? 
__dequeue_entity+0x2e/0x33
  Jan 31 13:07:43 asterix kernel: [8102d8c3] 
__do_page_fault+0x3bd/0x3e4
  Jan 31 13:07:43 asterix kernel: [810bbe1a] ? 
mprotect_fixup+0x1c9/0x1fb
  Jan 31 13:07:43 asterix kernel: [810aa0f0] ? vm_mmap_pgoff+0x6d/0x8f
  Jan 31 13:07:43 asterix kernel: [810795f5] ? SyS_futex+0x103/0x13d
  Jan 31 13:07:43 asterix kernel: [8102d8f3] do_page_fault+0x9/0xb
  Jan 31 13:07:43 asterix kernel: [813d3672] page_fault+0x22/0x30
  Jan 31 13:07:43 asterix kernel: Code: 00 41 f7 45 00 ff ff ff 01 0f 85 43 02 00 00 
41 8b 45 18 85 c0 0f 89 37 02 00 00 49 8b 55 00 4c 89 e8 66 85 d2 79 04 49 8b 45 30 
8b 40 1c 83 f8 01 0f 85 1b 02 00 00 49 8b 55 08 30 c0 48 85 d2
  Jan 31 13:07:43 asterix kernel: RIP  [810af0ac] 
isolate_migratepages_range+0x32d/0x653
  Jan 31 13:07:43 asterix kernel: RSP 8807d30b3928
  Jan 31 13:07:43 asterix kernel: CR2: 001c
  Jan 31 13:07:43 asterix kernel: ---[ end trace fba75c5b0b9175ea ]---


This seems to match:
  17027:   49 8b 17mov(%r15),%rdx   # page-flags
  1702a:   4c 89 f8mov%r15,%rax
  1702d:   80 e6 80

Re: Need help in bug in isolate_migratepages_range

2014-02-03 Thread Holger Kiehl


On Mon, 3 Feb 2014, David Rientjes wrote:


On Mon, 3 Feb 2014, Vlastimil Babka wrote:


It seems to come from balloon_page_movable() and its test page_count(page) ==
1.



Hmm, I think it might be because compound_head() == NULL here.  Holger,
this looks like a race condition when allocating a compound page, did you
only see it once or is it actually reproducible?


No, this only happened once. It is not reproducable, the system was running
for four days without problems. And before this kernel, five years without
any problems.

Thanks,
Holger
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Need help in bug in isolate_migratepages_range

2014-01-31 Thread Holger Kiehl


Hello,

today one of our system got a kernel bug message. It kept on running
but more and more process begin to be stuck in D state (eg. a simple w
command would never return) and I eventually had to reboot. Here the
full message:

   Jan 31 13:07:43 asterix kernel: BUG: unable to handle kernel NULL pointer 
dereference at 001c
   Jan 31 13:07:43 asterix kernel: IP: [] 
isolate_migratepages_range+0x32d/0x653
   Jan 31 13:07:43 asterix kernel: PGD 7d3074067 PUD 7d3073067 PMD 0
   Jan 31 13:07:43 asterix kernel: Oops:  [#1] SMP
   Jan 31 13:07:43 asterix kernel: Modules linked in: drbd lru_cache coretemp 
ipmi_devintf bonding nf_conntrack_ftp binfmt_misc usbhid i2c_i801 sg ehci_pci 
i2c_core ehci_hcd uhci_hcd i5000_edac i5k_amb ipmi_si ipmi_msghandler usbcore 
usb_common [last unloaded: microcode]
   Jan 31 13:07:43 asterix kernel: CPU: 5 PID: 14164 Comm: java Not tainted 
3.12.9 #1
   Jan 31 13:07:43 asterix kernel: Hardware name: FUJITSU SIEMENS PRIMERGY 
RX300 S4 /D2519, BIOS 4.06  Rev. 1.04.2519 07/30/2008
   Jan 31 13:07:43 asterix kernel: task: 8807d30b08c0 ti: 8807d30b2000 
task.ti: 8807d30b2000
   Jan 31 13:07:43 asterix kernel: RIP: 0010:[]  
[] isolate_migratepages_range+0x32d/0x653
   Jan 31 13:07:43 asterix kernel: RSP: :8807d30b3928  EFLAGS: 00010286
   Jan 31 13:07:43 asterix kernel: RAX:  RBX: 0020ec09 
RCX: 0002
   Jan 31 13:07:43 asterix kernel: RDX: 2c008000 RSI: 0004 
RDI: 006c
   Jan 31 13:07:43 asterix kernel: RBP: 8807d30b39f8 R08: 88083fbde390 
R09: 0001
   Jan 31 13:07:43 asterix kernel: R10:  R11: ea000733a000 
R12: 8807d30b3a58
   Jan 31 13:07:43 asterix kernel: R13: ea000733a1f8 R14:  
R15: 88083ffe1d80
   Jan 31 13:07:43 asterix kernel: FS:  7f9d9e72f910() 
GS:88083fd4() knlGS:
   Jan 31 13:07:43 asterix kernel: CS:  0010 DS:  ES:  CR0: 
8005003b
   Jan 31 13:07:43 asterix kernel: CR2: 001c CR3: 0007d307 
CR4: 000407e0
   Jan 31 13:07:43 asterix kernel: Stack:
   Jan 31 13:07:43 asterix kernel: 0009 88083ffe16c0 
ea2e6af0 8807d30b3998
   Jan 31 13:07:43 asterix kernel: 8807d30b2010 00ff8807d30b08c0 
8807d30b08c0 0020f000
   Jan 31 13:07:43 asterix kernel:  083b 
000a 8807d30b3a68
   Jan 31 13:07:43 asterix kernel: Call Trace:
   Jan 31 13:07:43 asterix kernel: [] ? 
lru_add_drain_cpu+0x25/0x97
   Jan 31 13:07:43 asterix kernel: [] compact_zone+0x2b5/0x319
   Jan 31 13:07:43 asterix kernel: [] ? put_super+0x20/0x2c
   Jan 31 13:07:43 asterix kernel: [] 
compact_zone_order+0xad/0xc4
   Jan 31 13:07:43 asterix kernel: [] 
try_to_compact_pages+0x91/0xe8
   Jan 31 13:07:43 asterix kernel: [] ? 
page_alloc_cpu_notify+0x3e/0x3e
   Jan 31 13:07:43 asterix kernel: [] 
__alloc_pages_direct_compact+0xae/0x195
   Jan 31 13:07:43 asterix kernel: [] 
__alloc_pages_nodemask+0x772/0x7b5
   Jan 31 13:07:43 asterix kernel: [] 
alloc_pages_vma+0xd6/0x101
   Jan 31 13:07:43 asterix kernel: [] 
do_huge_pmd_anonymous_page+0x199/0x2ee
   Jan 31 13:07:43 asterix kernel: [] 
handle_mm_fault+0x1b7/0xceb
   Jan 31 13:07:43 asterix kernel: [] ? 
__dequeue_entity+0x2e/0x33
   Jan 31 13:07:43 asterix kernel: [] 
__do_page_fault+0x3bd/0x3e4
   Jan 31 13:07:43 asterix kernel: [] ? 
mprotect_fixup+0x1c9/0x1fb
   Jan 31 13:07:43 asterix kernel: [] ? 
vm_mmap_pgoff+0x6d/0x8f
   Jan 31 13:07:43 asterix kernel: [] ? SyS_futex+0x103/0x13d
   Jan 31 13:07:43 asterix kernel: [] do_page_fault+0x9/0xb
   Jan 31 13:07:43 asterix kernel: [] page_fault+0x22/0x30
   Jan 31 13:07:43 asterix kernel: Code: 00 41 f7 45 00 ff ff ff 01 0f 85 43 02 00 00 
41 8b 45 18 85 c0 0f 89 37 02 00 00 49 8b 55 00 4c 89 e8 66 85 d2 79 04 49 8b 45 30 
<8b> 40 1c 83 f8 01 0f 85 1b 02 00 00 49 8b 55 08 30 c0 48 85 d2
   Jan 31 13:07:43 asterix kernel: RIP  [] 
isolate_migratepages_range+0x32d/0x653
   Jan 31 13:07:43 asterix kernel: RSP 
   Jan 31 13:07:43 asterix kernel: CR2: 001c
   Jan 31 13:07:43 asterix kernel: ---[ end trace fba75c5b0b9175ea ]---

Kernel is a plain kernel.org kernel 3.12.9 and it uses drbd to replicate
data to another host. Any idea what the cause of this bug is? Could it be
hardware? The system has been running now for five years without any problems.

Please CC me since I am not on the list.

Many thanks in advance.

Regards,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Need help in bug in isolate_migratepages_range

2014-01-31 Thread Holger Kiehl


Hello,

today one of our system got a kernel bug message. It kept on running
but more and more process begin to be stuck in D state (eg. a simple w
command would never return) and I eventually had to reboot. Here the
full message:

   Jan 31 13:07:43 asterix kernel: BUG: unable to handle kernel NULL pointer 
dereference at 001c
   Jan 31 13:07:43 asterix kernel: IP: [810af0ac] 
isolate_migratepages_range+0x32d/0x653
   Jan 31 13:07:43 asterix kernel: PGD 7d3074067 PUD 7d3073067 PMD 0
   Jan 31 13:07:43 asterix kernel: Oops:  [#1] SMP
   Jan 31 13:07:43 asterix kernel: Modules linked in: drbd lru_cache coretemp 
ipmi_devintf bonding nf_conntrack_ftp binfmt_misc usbhid i2c_i801 sg ehci_pci 
i2c_core ehci_hcd uhci_hcd i5000_edac i5k_amb ipmi_si ipmi_msghandler usbcore 
usb_common [last unloaded: microcode]
   Jan 31 13:07:43 asterix kernel: CPU: 5 PID: 14164 Comm: java Not tainted 
3.12.9 #1
   Jan 31 13:07:43 asterix kernel: Hardware name: FUJITSU SIEMENS PRIMERGY 
RX300 S4 /D2519, BIOS 4.06  Rev. 1.04.2519 07/30/2008
   Jan 31 13:07:43 asterix kernel: task: 8807d30b08c0 ti: 8807d30b2000 
task.ti: 8807d30b2000
   Jan 31 13:07:43 asterix kernel: RIP: 0010:[810af0ac]  
[810af0ac] isolate_migratepages_range+0x32d/0x653
   Jan 31 13:07:43 asterix kernel: RSP: :8807d30b3928  EFLAGS: 00010286
   Jan 31 13:07:43 asterix kernel: RAX:  RBX: 0020ec09 
RCX: 0002
   Jan 31 13:07:43 asterix kernel: RDX: 2c008000 RSI: 0004 
RDI: 006c
   Jan 31 13:07:43 asterix kernel: RBP: 8807d30b39f8 R08: 88083fbde390 
R09: 0001
   Jan 31 13:07:43 asterix kernel: R10:  R11: ea000733a000 
R12: 8807d30b3a58
   Jan 31 13:07:43 asterix kernel: R13: ea000733a1f8 R14:  
R15: 88083ffe1d80
   Jan 31 13:07:43 asterix kernel: FS:  7f9d9e72f910() 
GS:88083fd4() knlGS:
   Jan 31 13:07:43 asterix kernel: CS:  0010 DS:  ES:  CR0: 
8005003b
   Jan 31 13:07:43 asterix kernel: CR2: 001c CR3: 0007d307 
CR4: 000407e0
   Jan 31 13:07:43 asterix kernel: Stack:
   Jan 31 13:07:43 asterix kernel: 0009 88083ffe16c0 
ea2e6af0 8807d30b3998
   Jan 31 13:07:43 asterix kernel: 8807d30b2010 00ff8807d30b08c0 
8807d30b08c0 0020f000
   Jan 31 13:07:43 asterix kernel:  083b 
000a 8807d30b3a68
   Jan 31 13:07:43 asterix kernel: Call Trace:
   Jan 31 13:07:43 asterix kernel: [810a161f] ? 
lru_add_drain_cpu+0x25/0x97
   Jan 31 13:07:43 asterix kernel: [810af687] compact_zone+0x2b5/0x319
   Jan 31 13:07:43 asterix kernel: [810da586] ? put_super+0x20/0x2c
   Jan 31 13:07:43 asterix kernel: [810afa4d] 
compact_zone_order+0xad/0xc4
   Jan 31 13:07:43 asterix kernel: [810afaf5] 
try_to_compact_pages+0x91/0xe8
   Jan 31 13:07:43 asterix kernel: [8109b92d] ? 
page_alloc_cpu_notify+0x3e/0x3e
   Jan 31 13:07:43 asterix kernel: [8109da34] 
__alloc_pages_direct_compact+0xae/0x195
   Jan 31 13:07:43 asterix kernel: [8109e45d] 
__alloc_pages_nodemask+0x772/0x7b5
   Jan 31 13:07:43 asterix kernel: [810c85a3] 
alloc_pages_vma+0xd6/0x101
   Jan 31 13:07:43 asterix kernel: [810d47e3] 
do_huge_pmd_anonymous_page+0x199/0x2ee
   Jan 31 13:07:43 asterix kernel: [810b3884] 
handle_mm_fault+0x1b7/0xceb
   Jan 31 13:07:43 asterix kernel: [8105dedc] ? 
__dequeue_entity+0x2e/0x33
   Jan 31 13:07:43 asterix kernel: [8102d8c3] 
__do_page_fault+0x3bd/0x3e4
   Jan 31 13:07:43 asterix kernel: [810bbe1a] ? 
mprotect_fixup+0x1c9/0x1fb
   Jan 31 13:07:43 asterix kernel: [810aa0f0] ? 
vm_mmap_pgoff+0x6d/0x8f
   Jan 31 13:07:43 asterix kernel: [810795f5] ? SyS_futex+0x103/0x13d
   Jan 31 13:07:43 asterix kernel: [8102d8f3] do_page_fault+0x9/0xb
   Jan 31 13:07:43 asterix kernel: [813d3672] page_fault+0x22/0x30
   Jan 31 13:07:43 asterix kernel: Code: 00 41 f7 45 00 ff ff ff 01 0f 85 43 02 00 00 
41 8b 45 18 85 c0 0f 89 37 02 00 00 49 8b 55 00 4c 89 e8 66 85 d2 79 04 49 8b 45 30 
8b 40 1c 83 f8 01 0f 85 1b 02 00 00 49 8b 55 08 30 c0 48 85 d2
   Jan 31 13:07:43 asterix kernel: RIP  [810af0ac] 
isolate_migratepages_range+0x32d/0x653
   Jan 31 13:07:43 asterix kernel: RSP 8807d30b3928
   Jan 31 13:07:43 asterix kernel: CR2: 001c
   Jan 31 13:07:43 asterix kernel: ---[ end trace fba75c5b0b9175ea ]---

Kernel is a plain kernel.org kernel 3.12.9 and it uses drbd to replicate
data to another host. Any idea what the cause of this bug is? Could it be
hardware? The system has been running now for five years without any problems.

Please CC me since I am not on the list.

Many thanks in advance.

Regards,
Holger
--
To unsubscribe from this list: send the line unsubscribe

RE: Problems with ixgbe driver

2013-06-17 Thread Holger Kiehl


Hello,

first, thank you for the quick help!

On Fri, 14 Jun 2013, Tantilov, Emil S wrote:


-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
Behalf Of Holger Kiehl
Sent: Friday, June 14, 2013 4:50 AM
To: e1000-de...@lists.sf.net
Cc: linux-kernel; net...@vger.kernel.org
Subject: Problems with ixgbe driver

Hello,

I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with
a total of 12 cores. Hyperthreading is enabled so there are 24 cores.
The problem I have is that when other systems send large amount of data
the network with the intel ixgbe driver gets very slow. Ping times go up
from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2
minutes. What is strange is that heatbeat is configured on the system
with a serial connection to another node and kernel always reports


If the network slows down so much there should be some indication in dmesg. 
Like Tx hangs perhaps.
Can you provide the output of dmesg and ethtool -S from the offending interface 
after the issue occurs?


No, there is absolute no indication in dmesg or /var/log/messages. But here
the ethtool output when ping times go up:

   root@helena:~# ethtool -S eth6
   NIC statistics:
rx_packets: 4410779
tx_packets: 8902514
rx_bytes: 2014041824
tx_bytes: 13199913202
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 4245
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 28143
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
rx_pkts_nic: 2401276937
tx_pkts_nic: 3868619482
rx_bytes_nic: 868282794731
tx_bytes_nic: 5743382228649
lsc_int: 4
tx_busy: 0
non_eop_descs: 743957
broadcast: 1745556
rx_no_buffer_count: 0
tx_timeout_count: 0
tx_restart_queue: 425
rx_long_length_errors: 0
rx_short_length_errors: 0
tx_flow_control_xon: 171
rx_flow_control_xon: 0
tx_flow_control_xoff: 277
rx_flow_control_xoff: 0
rx_csum_offload_errors: 0
alloc_rx_page_failed: 0
alloc_rx_buff_failed: 0
lro_aggregated: 0
lro_flushed: 0
rx_no_dma_resources: 0
hw_rsc_aggregated: 1153374
hw_rsc_flushed: 129169
fdir_match: 2424508153
fdir_miss: 1706029
fdir_overflow: 33
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_queue_0_packets: 470182
tx_queue_0_bytes: 690123121
tx_queue_1_packets: 797784
tx_queue_1_bytes: 1203968369
tx_queue_2_packets: 648692
tx_queue_2_bytes: 950171718
tx_queue_3_packets: 647434
tx_queue_3_bytes: 948647518
tx_queue_4_packets: 263216
tx_queue_4_bytes: 394806409
tx_queue_5_packets: 426786
tx_queue_5_bytes: 629387628
tx_queue_6_packets: 253708
tx_queue_6_bytes: 371774276
tx_queue_7_packets: 544634
tx_queue_7_bytes: 812223169
tx_queue_8_packets: 279056
tx_queue_8_bytes: 407792510
tx_queue_9_packets: 735792
tx_queue_9_bytes: 1092693961
tx_queue_10_packets: 393576
tx_queue_10_bytes: 583283986
tx_queue_11_packets: 712565
tx_queue_11_bytes: 1037740789
tx_queue_12_packets: 264445
tx_queue_12_bytes: 386010613
tx_queue_13_packets: 246828
tx_queue_13_bytes: 370387352
tx_queue_14_packets: 191789
tx_queue_14_bytes: 281160607
tx_queue_15_packets: 384581
tx_queue_15_bytes: 579890782
tx_queue_16_packets: 175119
tx_queue_16_bytes: 261312970
tx_queue_17_packets: 151219
tx_queue_17_bytes: 220259675
tx_queue_18_packets: 467746
tx_queue_18_bytes: 707472612
tx_queue_19_packets: 30642
tx_queue_19_bytes: 44896997
tx_queue_20_packets: 157957
tx_queue_20_bytes: 238772784
tx_queue_21_packets: 287819
tx_queue_21_bytes: 434965075
tx_queue_22_packets: 269298
tx_queue_22_bytes: 407637986
tx_queue_23_packets: 102344
tx_queue_23_bytes: 145542751
rx_queue_0_packets: 219438
rx_queue_0_bytes: 273936020
rx_queue_1_packets: 398269
rx_queue_1_bytes: 52080243
rx_queue_2_packets: 285870
rx_queue_2_bytes: 102299543
rx_queue_3_packets: 347238
rx_queue_3_bytes: 145830086
rx_queue_4_packets: 118448
rx_queue_4_bytes: 17515218
rx_queue_5_packets: 228029
rx_queue_5_bytes: 114142681
rx_queue_6_packets: 94285
rx_queue_6_bytes: 107618165
rx_queue_7_packets: 289615
rx_queue_7_bytes: 168428647

RE: Problems with ixgbe driver

2013-06-17 Thread Holger Kiehl


Hello,

first, thank you for the quick help!

On Fri, 14 Jun 2013, Tantilov, Emil S wrote:


-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
Behalf Of Holger Kiehl
Sent: Friday, June 14, 2013 4:50 AM
To: e1000-de...@lists.sf.net
Cc: linux-kernel; net...@vger.kernel.org
Subject: Problems with ixgbe driver

Hello,

I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with
a total of 12 cores. Hyperthreading is enabled so there are 24 cores.
The problem I have is that when other systems send large amount of data
the network with the intel ixgbe driver gets very slow. Ping times go up
from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2
minutes. What is strange is that heatbeat is configured on the system
with a serial connection to another node and kernel always reports


If the network slows down so much there should be some indication in dmesg. 
Like Tx hangs perhaps.
Can you provide the output of dmesg and ethtool -S from the offending interface 
after the issue occurs?


No, there is absolute no indication in dmesg or /var/log/messages. But here
the ethtool output when ping times go up:

   root@helena:~# ethtool -S eth6
   NIC statistics:
rx_packets: 4410779
tx_packets: 8902514
rx_bytes: 2014041824
tx_bytes: 13199913202
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 4245
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 28143
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
rx_pkts_nic: 2401276937
tx_pkts_nic: 3868619482
rx_bytes_nic: 868282794731
tx_bytes_nic: 5743382228649
lsc_int: 4
tx_busy: 0
non_eop_descs: 743957
broadcast: 1745556
rx_no_buffer_count: 0
tx_timeout_count: 0
tx_restart_queue: 425
rx_long_length_errors: 0
rx_short_length_errors: 0
tx_flow_control_xon: 171
rx_flow_control_xon: 0
tx_flow_control_xoff: 277
rx_flow_control_xoff: 0
rx_csum_offload_errors: 0
alloc_rx_page_failed: 0
alloc_rx_buff_failed: 0
lro_aggregated: 0
lro_flushed: 0
rx_no_dma_resources: 0
hw_rsc_aggregated: 1153374
hw_rsc_flushed: 129169
fdir_match: 2424508153
fdir_miss: 1706029
fdir_overflow: 33
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_queue_0_packets: 470182
tx_queue_0_bytes: 690123121
tx_queue_1_packets: 797784
tx_queue_1_bytes: 1203968369
tx_queue_2_packets: 648692
tx_queue_2_bytes: 950171718
tx_queue_3_packets: 647434
tx_queue_3_bytes: 948647518
tx_queue_4_packets: 263216
tx_queue_4_bytes: 394806409
tx_queue_5_packets: 426786
tx_queue_5_bytes: 629387628
tx_queue_6_packets: 253708
tx_queue_6_bytes: 371774276
tx_queue_7_packets: 544634
tx_queue_7_bytes: 812223169
tx_queue_8_packets: 279056
tx_queue_8_bytes: 407792510
tx_queue_9_packets: 735792
tx_queue_9_bytes: 1092693961
tx_queue_10_packets: 393576
tx_queue_10_bytes: 583283986
tx_queue_11_packets: 712565
tx_queue_11_bytes: 1037740789
tx_queue_12_packets: 264445
tx_queue_12_bytes: 386010613
tx_queue_13_packets: 246828
tx_queue_13_bytes: 370387352
tx_queue_14_packets: 191789
tx_queue_14_bytes: 281160607
tx_queue_15_packets: 384581
tx_queue_15_bytes: 579890782
tx_queue_16_packets: 175119
tx_queue_16_bytes: 261312970
tx_queue_17_packets: 151219
tx_queue_17_bytes: 220259675
tx_queue_18_packets: 467746
tx_queue_18_bytes: 707472612
tx_queue_19_packets: 30642
tx_queue_19_bytes: 44896997
tx_queue_20_packets: 157957
tx_queue_20_bytes: 238772784
tx_queue_21_packets: 287819
tx_queue_21_bytes: 434965075
tx_queue_22_packets: 269298
tx_queue_22_bytes: 407637986
tx_queue_23_packets: 102344
tx_queue_23_bytes: 145542751
rx_queue_0_packets: 219438
rx_queue_0_bytes: 273936020
rx_queue_1_packets: 398269
rx_queue_1_bytes: 52080243
rx_queue_2_packets: 285870
rx_queue_2_bytes: 102299543
rx_queue_3_packets: 347238
rx_queue_3_bytes: 145830086
rx_queue_4_packets: 118448
rx_queue_4_bytes: 17515218
rx_queue_5_packets: 228029
rx_queue_5_bytes: 114142681
rx_queue_6_packets: 94285
rx_queue_6_bytes: 107618165
rx_queue_7_packets: 289615
rx_queue_7_bytes: 168428647

Problems with ixgbe driver

2013-06-14 Thread Holger Kiehl


Hello,

I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with
a total of 12 cores. Hyperthreading is enabled so there are 24 cores.
The problem I have is that when other systems send large amount of data
the network with the intel ixgbe driver gets very slow. Ping times go up
from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2
minutes. What is strange is that heatbeat is configured on the system
with a serial connection to another node and kernel always reports

ttyS0: 4 input overrun(s)

when lot of data is send and the ping time goes up.

On the network there are three vlan's configured. The network is bonded
(active-backup) together with another HP NC523SFP 10Gb 2-port Server
Adapter. When I switch the network to this card the problem goes away.
Also the ttyS0 input overruns disappear. Note also both network cards
are connected to the same switch.

The system uses Scientific Linux 6.4 with kernel.org kernel. I noticed
this behavior with kernel 3.9.5 and 3.9.6-rc1. Before I did not notice
it because traffic always went over the HP NC523SFP qlcnic card.

In search for a solution to the problem I found a newer ixgbe driver
3.15.1 (3.9.6-rc1. has 3.11.33-k) and tried that. But it has the same
problem. However when I load the module as follows:

modprobe ixgbe RSS=8,8

the problem goes away. The kernel.org ixgbe driver does not offer this
option. Why? It seems that both drivers have problems on systems with
24 cpu's. But I cannot believe that I am the only one who noticed this,
since ixgbe is widely used.

It would really be nice if one could set the RSS=8,8 option for kernel.org
ixgbe driver too. Or if someone could tell me where I can force the driver
to Receive Side Scaling to 8 even if it means editing the source code.

Below I have added some additional information. Please CC me since I
am not subscribed to any of these lists. And please do not hesitate
to ask if more information is needed.

Many thanks in advance.

Regards,
Holger


Loading ixgbe module 3.15.1 without any options:

   2013-06-14T10:01:15.001506+00:00 helena kernel: [74474.075411] Intel(R) 10 
Gigabit PCI Express Network Driver - version 3.15.1
   2013-06-14T10:01:15.033866+00:00 helena kernel: [74474.116422] Copyright (c) 
1999-2013 Intel Corporation.
   2013-06-14T10:01:15.204956+00:00 helena kernel: [74474.319440] ixgbe 
:10:00.0: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:80
   2013-06-14T10:01:15.317447+00:00 helena kernel: [74474.362568] ixgbe 
:10:00.0 eth6: MAC: 2, PHY: 15, SFP+: 5, PBA No: E68785-006
   2013-06-14T10:01:15.317465+00:00 helena kernel: [74474.394068] bonding: 
bond0: Adding slave eth6.
   2013-06-14T10:01:15.317468+00:00 helena kernel: [74474.431805] ixgbe 
:10:00.0 eth6: Enabled Features: RxQ: 24 TxQ: 24 FdirHash RSC
   2013-06-14T10:01:15.519117+00:00 helena kernel: [74474.599206] 8021q: adding 
VLAN 0 to HW filter on device eth6
   2013-06-14T10:01:15.592853+00:00 helena kernel: [74474.633370] bonding: 
bond0: enslaving eth6 as a backup interface with a down link.
   2013-06-14T10:01:15.592864+00:00 helena kernel: [74474.666823] ixgbe 
:10:00.0 eth6: detected SFP+: 5
   2013-06-14T10:01:15.634509+00:00 helena kernel: [74474.707900] ixgbe 
:10:00.0 eth6: Intel(R) 10 Gigabit Network Connection
   2013-06-14T10:01:15.888030+00:00 helena kernel: [74474.917771] ixgbe 
:10:00.1: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:81
   2013-06-14T10:01:15.888032+00:00 helena kernel: [74474.918516] ixgbe 
:10:00.0 eth6: NIC Link is Up 10 Gbps, Flow Control: RX/TX
   2013-06-14T10:01:15.981283+00:00 helena kernel: [74475.001538] ixgbe 
:10:00.1 eth7: MAC: 2, PHY: 15, SFP+: 6, PBA No: E68785-006
   2013-06-14T10:01:15.981293+00:00 helena kernel: [74475.006351] bonding: 
bond0: link status definitely up for interface eth6, 1 Mbps full duplex.
   2013-06-14T10:01:16.025063+00:00 helena kernel: [74475.094633] ixgbe 
:10:00.1 eth7: Enabled Features: RxQ: 24 TxQ: 24 FdirHash RSC
   2013-06-14T10:01:16.067357+00:00 helena kernel: [74475.138402] ixgbe 
:10:00.1 eth7: Intel(R) 10 Gigabit Network Connection


Loading ixgbe module 3.15.1 with RSS=8,8:

   2013-06-14T10:04:24.790464+00:00 helena kernel: [74663.558702] Intel(R) 10 
Gigabit PCI Express Network Driver - version 3.15.1
   2013-06-14T10:04:24.790484+00:00 helena kernel: [74663.601435] Copyright (c) 
1999-2013 Intel Corporation.
   2013-06-14T10:04:24.853174+00:00 helena kernel: [74663.630652] ixgbe: 
Receive-Side Scaling (RSS) set to 8
   2013-06-14T10:04:25.043310+00:00 helena kernel: [74663.813984] ixgbe 
:10:00.0: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:80
   2013-06-14T10:04:25.113547+00:00 helena kernel: [74663.853937] ixgbe 
:10:00.0 eth6: MAC: 2, PHY: 15, SFP+: 5, PBA No: E68785-006
   2013-06-14T10:04:25.113561+00:00 helena kernel: [74663.882910] bonding: 
bond0: Adding slave eth6.
   2013-06-14T10:04:25.159260+00:00 helena kernel: [74663.924060] ixgbe

Problems with ixgbe driver

2013-06-14 Thread Holger Kiehl


Hello,

I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with
a total of 12 cores. Hyperthreading is enabled so there are 24 cores.
The problem I have is that when other systems send large amount of data
the network with the intel ixgbe driver gets very slow. Ping times go up
from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2
minutes. What is strange is that heatbeat is configured on the system
with a serial connection to another node and kernel always reports

ttyS0: 4 input overrun(s)

when lot of data is send and the ping time goes up.

On the network there are three vlan's configured. The network is bonded
(active-backup) together with another HP NC523SFP 10Gb 2-port Server
Adapter. When I switch the network to this card the problem goes away.
Also the ttyS0 input overruns disappear. Note also both network cards
are connected to the same switch.

The system uses Scientific Linux 6.4 with kernel.org kernel. I noticed
this behavior with kernel 3.9.5 and 3.9.6-rc1. Before I did not notice
it because traffic always went over the HP NC523SFP qlcnic card.

In search for a solution to the problem I found a newer ixgbe driver
3.15.1 (3.9.6-rc1. has 3.11.33-k) and tried that. But it has the same
problem. However when I load the module as follows:

modprobe ixgbe RSS=8,8

the problem goes away. The kernel.org ixgbe driver does not offer this
option. Why? It seems that both drivers have problems on systems with
24 cpu's. But I cannot believe that I am the only one who noticed this,
since ixgbe is widely used.

It would really be nice if one could set the RSS=8,8 option for kernel.org
ixgbe driver too. Or if someone could tell me where I can force the driver
to Receive Side Scaling to 8 even if it means editing the source code.

Below I have added some additional information. Please CC me since I
am not subscribed to any of these lists. And please do not hesitate
to ask if more information is needed.

Many thanks in advance.

Regards,
Holger


Loading ixgbe module 3.15.1 without any options:

   2013-06-14T10:01:15.001506+00:00 helena kernel: [74474.075411] Intel(R) 10 
Gigabit PCI Express Network Driver - version 3.15.1
   2013-06-14T10:01:15.033866+00:00 helena kernel: [74474.116422] Copyright (c) 
1999-2013 Intel Corporation.
   2013-06-14T10:01:15.204956+00:00 helena kernel: [74474.319440] ixgbe 
:10:00.0: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:80
   2013-06-14T10:01:15.317447+00:00 helena kernel: [74474.362568] ixgbe 
:10:00.0 eth6: MAC: 2, PHY: 15, SFP+: 5, PBA No: E68785-006
   2013-06-14T10:01:15.317465+00:00 helena kernel: [74474.394068] bonding: 
bond0: Adding slave eth6.
   2013-06-14T10:01:15.317468+00:00 helena kernel: [74474.431805] ixgbe 
:10:00.0 eth6: Enabled Features: RxQ: 24 TxQ: 24 FdirHash RSC
   2013-06-14T10:01:15.519117+00:00 helena kernel: [74474.599206] 8021q: adding 
VLAN 0 to HW filter on device eth6
   2013-06-14T10:01:15.592853+00:00 helena kernel: [74474.633370] bonding: 
bond0: enslaving eth6 as a backup interface with a down link.
   2013-06-14T10:01:15.592864+00:00 helena kernel: [74474.666823] ixgbe 
:10:00.0 eth6: detected SFP+: 5
   2013-06-14T10:01:15.634509+00:00 helena kernel: [74474.707900] ixgbe 
:10:00.0 eth6: Intel(R) 10 Gigabit Network Connection
   2013-06-14T10:01:15.888030+00:00 helena kernel: [74474.917771] ixgbe 
:10:00.1: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:81
   2013-06-14T10:01:15.888032+00:00 helena kernel: [74474.918516] ixgbe 
:10:00.0 eth6: NIC Link is Up 10 Gbps, Flow Control: RX/TX
   2013-06-14T10:01:15.981283+00:00 helena kernel: [74475.001538] ixgbe 
:10:00.1 eth7: MAC: 2, PHY: 15, SFP+: 6, PBA No: E68785-006
   2013-06-14T10:01:15.981293+00:00 helena kernel: [74475.006351] bonding: 
bond0: link status definitely up for interface eth6, 1 Mbps full duplex.
   2013-06-14T10:01:16.025063+00:00 helena kernel: [74475.094633] ixgbe 
:10:00.1 eth7: Enabled Features: RxQ: 24 TxQ: 24 FdirHash RSC
   2013-06-14T10:01:16.067357+00:00 helena kernel: [74475.138402] ixgbe 
:10:00.1 eth7: Intel(R) 10 Gigabit Network Connection


Loading ixgbe module 3.15.1 with RSS=8,8:

   2013-06-14T10:04:24.790464+00:00 helena kernel: [74663.558702] Intel(R) 10 
Gigabit PCI Express Network Driver - version 3.15.1
   2013-06-14T10:04:24.790484+00:00 helena kernel: [74663.601435] Copyright (c) 
1999-2013 Intel Corporation.
   2013-06-14T10:04:24.853174+00:00 helena kernel: [74663.630652] ixgbe: 
Receive-Side Scaling (RSS) set to 8
   2013-06-14T10:04:25.043310+00:00 helena kernel: [74663.813984] ixgbe 
:10:00.0: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:80
   2013-06-14T10:04:25.113547+00:00 helena kernel: [74663.853937] ixgbe 
:10:00.0 eth6: MAC: 2, PHY: 15, SFP+: 5, PBA No: E68785-006
   2013-06-14T10:04:25.113561+00:00 helena kernel: [74663.882910] bonding: 
bond0: Adding slave eth6.
   2013-06-14T10:04:25.159260+00:00 helena kernel: [74663.924060] ixgbe

Re: Enabling hardlink restrictions to the Linux VFS in 3.6 by default

2012-10-26 Thread Holger Kiehl


Hello Kees,

first, many thanks for trying to help!

On Thu, 25 Oct 2012, Kees Cook wrote:


Hi Holger,

On Thu, Oct 25, 2012 at 12:13:40PM +, Holger Kiehl wrote:

as of linux 3.6 hardlink restrictions to the Linux VFS have been enabled
by default. This breaks the application AFD [1] of which I am the author.


Sorry this created a problem for you!


Internally it uses hardlink to distribute files. The reason for hardlinks
is that AFD can distribute one file to many destinations and for each
distributing process it creates a directory with hardlinks to the original
file. That way AFD itself never needs to copy the content of a file. Another
nice feature about hardlinks was that there is no need to have any logic in
the code needing AFD to know where the original file was, each distributing
process could delete its hardlink and the last one would delete the real
file. This way AFD could distribute files at rates of more then 2 files
per second (in benchmarks). This has worked from the first linux kernel
up to 3.5.7 and with solaris, hpux, aix, ftx, irix. As of 3.6 this does
not work for files where AFD does not have write permissions. It was always
sufficient to just have read permission on a file it wants to distribute.


Just to clarify, not even read access was needed for hardlinks:

$ whoami
kees
$ ls -l /etc/shadow
-r--r- 1 root shadow 3112 Oct 22 17:02 /etc/shadow
$ ln /etc/shadow /tmp/ohai
$ ls -l /tmp/ohai
-r--r- 2 root shadow 3112 Oct 22 17:02 ohai


Correct, but when AFD wants to distribute the file via for example FTP
it must have read access on the file. Because it needs to read the file
when it wants to send it on a socket.


You mention "the last one would delete the real file". That would have
required AFD to have write permission to the directory where the original
file existed? Maybe there is something in your architecture that could
take advantage of that? Directory group-write set-gid? I haven't taken
a look at AFD's code.


Right, it must have write permission on the directory that is monitored
by AFD. When it detects a file it moves (rename()) it to an internal
directory where AFD works. So this step still works. But from there it
creates hardlinks for each distributing job. But this no longer works if
AFD does not have write access on the file itself. So even if set-gid
is set, this would still not work if the file does not have write
permission for the group.


The fix for the "at" daemon [2] mentioned in the commitdiff [3] cannot
be used for AFD since it is not run with root privileges. Is there any
other way I can "fix" my application? I currently can see no other way
then doing it via: echo 0 > /proc/sys/fs/protected_hardlinks


You said you have read access to these files, so perhaps you can make
a copy when you have read but not write, and then all the subsequent
duplication would be able to hardlink?


This is exactly what AFD tries to avoid. AFD is used on systems where it
distributes Terabytes of data daily and if it would need to copy the file
first imagine the strain it imposes on those servers.


If you wanted to turn off the sysctl, you could have AFD could ship
files in /etc/sysctl.d/ (or your distro equivalent) to turn it off.


Yes, that could be done. However, I do not want as a maintainer of one
software package by default disable or enable anything in the kernel.
I do not think the system administrators would like this.


I'm sure there are plenty of options available.


Sorry, I cannot see them. But please if you or others have more ideas
I am certainly open to change AFD if it can be done efficiently.


Why is such a fundamentally change to the linux kernel activated by default?


Based on about two years of testing in Ubuntu, the number of problems
was vanishingly small, so the security benefit is seen to outweigh
the downside.


Ubuntu is known to be very user friendly and mostly used by users on their
laptops/pc's and not so common in the server environment such as Redhat,
SLES, etc. So I question the statement "vanishingly small", when you
enable it in those environments by default.

And I think there is a real benefit in that one can do hardlinks on a
file that one does not own, which I think was not seen by those that
disable this feature now by default.


Would it not be better if it is the other way around, that the system
administrator or distributions enable this?


Virtually all distributions would have turned this on by default,
so it seemed better to many people to just make it the default in the
kernel. Only unusual corner-cases would need it disabled.


So you too would say not all distributions would enable it by default.
Would it then not be better for them to first try this and see if the number
of problems is really "vanishingly small". And then if all distributions
enable this by default one can do it in the kernel by default as well.
Has it not always worked that way?

A

Re: Enabling hardlink restrictions to the Linux VFS in 3.6 by default

2012-10-26 Thread Holger Kiehl


Hello Kees,

first, many thanks for trying to help!

On Thu, 25 Oct 2012, Kees Cook wrote:


Hi Holger,

On Thu, Oct 25, 2012 at 12:13:40PM +, Holger Kiehl wrote:

as of linux 3.6 hardlink restrictions to the Linux VFS have been enabled
by default. This breaks the application AFD [1] of which I am the author.


Sorry this created a problem for you!


Internally it uses hardlink to distribute files. The reason for hardlinks
is that AFD can distribute one file to many destinations and for each
distributing process it creates a directory with hardlinks to the original
file. That way AFD itself never needs to copy the content of a file. Another
nice feature about hardlinks was that there is no need to have any logic in
the code needing AFD to know where the original file was, each distributing
process could delete its hardlink and the last one would delete the real
file. This way AFD could distribute files at rates of more then 2 files
per second (in benchmarks). This has worked from the first linux kernel
up to 3.5.7 and with solaris, hpux, aix, ftx, irix. As of 3.6 this does
not work for files where AFD does not have write permissions. It was always
sufficient to just have read permission on a file it wants to distribute.


Just to clarify, not even read access was needed for hardlinks:

$ whoami
kees
$ ls -l /etc/shadow
-r--r- 1 root shadow 3112 Oct 22 17:02 /etc/shadow
$ ln /etc/shadow /tmp/ohai
$ ls -l /tmp/ohai
-r--r- 2 root shadow 3112 Oct 22 17:02 ohai


Correct, but when AFD wants to distribute the file via for example FTP
it must have read access on the file. Because it needs to read the file
when it wants to send it on a socket.


You mention the last one would delete the real file. That would have
required AFD to have write permission to the directory where the original
file existed? Maybe there is something in your architecture that could
take advantage of that? Directory group-write set-gid? I haven't taken
a look at AFD's code.


Right, it must have write permission on the directory that is monitored
by AFD. When it detects a file it moves (rename()) it to an internal
directory where AFD works. So this step still works. But from there it
creates hardlinks for each distributing job. But this no longer works if
AFD does not have write access on the file itself. So even if set-gid
is set, this would still not work if the file does not have write
permission for the group.


The fix for the at daemon [2] mentioned in the commitdiff [3] cannot
be used for AFD since it is not run with root privileges. Is there any
other way I can fix my application? I currently can see no other way
then doing it via: echo 0  /proc/sys/fs/protected_hardlinks


You said you have read access to these files, so perhaps you can make
a copy when you have read but not write, and then all the subsequent
duplication would be able to hardlink?


This is exactly what AFD tries to avoid. AFD is used on systems where it
distributes Terabytes of data daily and if it would need to copy the file
first imagine the strain it imposes on those servers.


If you wanted to turn off the sysctl, you could have AFD could ship
files in /etc/sysctl.d/ (or your distro equivalent) to turn it off.


Yes, that could be done. However, I do not want as a maintainer of one
software package by default disable or enable anything in the kernel.
I do not think the system administrators would like this.


I'm sure there are plenty of options available.


Sorry, I cannot see them. But please if you or others have more ideas
I am certainly open to change AFD if it can be done efficiently.


Why is such a fundamentally change to the linux kernel activated by default?


Based on about two years of testing in Ubuntu, the number of problems
was vanishingly small, so the security benefit is seen to outweigh
the downside.


Ubuntu is known to be very user friendly and mostly used by users on their
laptops/pc's and not so common in the server environment such as Redhat,
SLES, etc. So I question the statement vanishingly small, when you
enable it in those environments by default.

And I think there is a real benefit in that one can do hardlinks on a
file that one does not own, which I think was not seen by those that
disable this feature now by default.


Would it not be better if it is the other way around, that the system
administrator or distributions enable this?


Virtually all distributions would have turned this on by default,
so it seemed better to many people to just make it the default in the
kernel. Only unusual corner-cases would need it disabled.


So you too would say not all distributions would enable it by default.
Would it then not be better for them to first try this and see if the number
of problems is really vanishingly small. And then if all distributions
enable this by default one can do it in the kernel by default as well.
Has it not always worked that way?

Again many thanks for trying to help!

Regards,
Holger

Enabling hardlink restrictions to the Linux VFS in 3.6 by default

2012-10-25 Thread Holger Kiehl

Hello,

as of linux 3.6 hardlink restrictions to the Linux VFS have been enabled
by default. This breaks the application AFD [1] of which I am the author.
Internally it uses hardlink to distribute files. The reason for hardlinks
is that AFD can distribute one file to many destinations and for each
distributing process it creates a directory with hardlinks to the original
file. That way AFD itself never needs to copy the content of a file. Another
nice feature about hardlinks was that there is no need to have any logic in
the code needing AFD to know where the original file was, each distributing
process could delete its hardlink and the last one would delete the real
file. This way AFD could distribute files at rates of more then 2 files
per second (in benchmarks). This has worked from the first linux kernel
up to 3.5.7 and with solaris, hpux, aix, ftx, irix. As of 3.6 this does
not work for files where AFD does not have write permissions. It was always
sufficient to just have read permission on a file it wants to distribute.

The fix for the "at" daemon [2] mentioned in the commitdiff [3] cannot
be used for AFD since it is not run with root privileges. Is there any
other way I can "fix" my application? I currently can see no other way
then doing it via: echo 0 > /proc/sys/fs/protected_hardlinks

Why is such a fundamentally change to the linux kernel activated by default?
Would it not be better if it is the other way around, that the system
administrator or distributions enable this?

Regards,
Holger

PS: Please CC me as I am not on the list.

[1] http://www.dwd.de/AFD
[2]
http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279
[3]
https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=800179c9b8a1e796e441674776d11cd4c05d61d7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Enabling hardlink restrictions to the Linux VFS in 3.6 by default

2012-10-25 Thread Holger Kiehl

Hello,

The fix for the at daemon [2] mentioned in the commitdiff [3] cannot
be used for AFD since it is not run with root privileges. Is there any
other way I can fix my application? I currently can see no other way
then doing it via: echo 0 /proc/sys/fs/protected_hardlinks

Why is such a fundamentally change to the linux kernel activated by default?
Would it not be better if it is the other way around, that the system
administrator or distributions enable this?

Regards,
Holger

PS: Please CC me as I am not on the list.

[1] http://www.dwd.de/AFD
[2]
http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279
[3]
https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=800179c9b8a1e796e441674776d11cd4c05d61d7
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

What happened to TRIM support for raid linear/0/1/10?

2012-08-08 Thread Holger Kiehl


Hello,

I have been using the patches posted by Shaohua Li on 16th March 2012:

   http://lkml.indiana.edu/hypermail/linux/kernel/1203.2/00048.html

for several month on a very busy file server (serving 9 million files
with 5.3 TiB daily) without any problems.

Is there any chance that these patches will go into the official kernel?
Or what is the reason that these patches are no applied?

I have attached the patch set in one big patch for 3.5. Please do not
use it since I am not sure if it is correct. Shaohua could you please
take a look if it is correct and maybe post a new one?

Personally, I would think that TRIM support MD would be a very good thing.

Regards,
Holgerdiff -u --recursive --new-file linux-3.5.orig/drivers/md/linear.c 
linux-3.5/drivers/md/linear.c
--- linux-3.5.orig/drivers/md/linear.c  2012-07-21 20:58:29.0 +
+++ linux-3.5/drivers/md/linear.c   2012-07-27 06:53:39.507121434 +
@@ -138,6 +138,7 @@
struct linear_conf *conf;
struct md_rdev *rdev;
int i, cnt;
+   bool discard_supported = false;
 
conf = kzalloc (sizeof (*conf) + raid_disks*sizeof(struct dev_info),
GFP_KERNEL);
@@ -171,6 +172,8 @@
conf->array_sectors += rdev->sectors;
cnt++;
 
+   if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+   discard_supported = true;
}
if (cnt != raid_disks) {
printk(KERN_ERR "md/linear:%s: not enough drives present. 
Aborting!\n",
@@ -178,6 +181,11 @@
goto out;
}
 
+   if (!discard_supported)
+   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+   else
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
/*
 * Here we calculate the device offsets.
 */
@@ -326,6 +334,14 @@
bio->bi_sector = bio->bi_sector - start_sector
+ tmp_dev->rdev->data_offset;
rcu_read_unlock();
+
+   if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+   !blk_queue_discard(bdev_get_queue(bio->bi_bdev {
+   /* Just ignore it */
+   bio_endio(bio, 0);
+   return;
+   }
+
generic_make_request(bio);
 }
 
diff -u --recursive --new-file linux-3.5.orig/drivers/md/raid0.c 
linux-3.5/drivers/md/raid0.c
--- linux-3.5.orig/drivers/md/raid0.c   2012-07-21 20:58:29.0 +
+++ linux-3.5/drivers/md/raid0.c2012-07-27 06:53:39.507121434 +
@@ -88,6 +88,7 @@
char b[BDEVNAME_SIZE];
char b2[BDEVNAME_SIZE];
struct r0conf *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+   bool discard_supported = false;
 
if (!conf)
return -ENOMEM;
@@ -195,6 +196,9 @@
if (!smallest || (rdev1->sectors < smallest->sectors))
smallest = rdev1;
cnt++;
+
+   if (blk_queue_discard(bdev_get_queue(rdev1->bdev)))
+   discard_supported = true;
}
if (cnt != mddev->raid_disks) {
printk(KERN_ERR "md/raid0:%s: too few disks (%d of %d) - "
@@ -272,6 +276,11 @@
blk_queue_io_opt(mddev->queue,
 (mddev->chunk_sectors << 9) * mddev->raid_disks);
 
+   if (!discard_supported)
+   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+   else
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
pr_debug("md/raid0:%s: done.\n", mdname(mddev));
*private_conf = conf;
 
@@ -422,6 +431,7 @@
if (md_check_no_bitmap(mddev))
return -EINVAL;
blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
+   blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
 
/* if private is not null, we are here after takeover */
if (mddev->private == NULL) {
@@ -509,7 +519,7 @@
sector_t sector = bio->bi_sector;
struct bio_pair *bp;
/* Sanity check -- queue functions should prevent this 
happening */
-   if (bio->bi_vcnt != 1 ||
+   if ((bio->bi_vcnt != 1 && bio->bi_vcnt != 0) ||
bio->bi_idx != 0)
goto bad_map;
/* This is a one page bio that upper layers
@@ -535,6 +545,13 @@
bio->bi_sector = sector_offset + zone->dev_start +
tmp_dev->data_offset;
 
+   if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+   !blk_queue_discard(bdev_get_queue(bio->bi_bdev {
+   /* Just ignore it */
+   bio_endio(bio, 0);
+   return;
+   }
+
generic_make_request(bio);
return;
 
diff -u --recursive --new-file linux-3.5.orig/drivers/md/raid10.c 
linux-3.5/drivers/md/raid10.c
--- linux-3.5.orig/drivers/md/raid10.c  2012-07-21 20:58:29.0 +
+++ linux-3.5/drivers/md/raid10.c   2012-07-27

What happened to TRIM support for raid linear/0/1/10?

2012-08-08 Thread Holger Kiehl


Hello,

I have been using the patches posted by Shaohua Li on 16th March 2012:

   http://lkml.indiana.edu/hypermail/linux/kernel/1203.2/00048.html

for several month on a very busy file server (serving 9 million files
with 5.3 TiB daily) without any problems.

Is there any chance that these patches will go into the official kernel?
Or what is the reason that these patches are no applied?

I have attached the patch set in one big patch for 3.5. Please do not
use it since I am not sure if it is correct. Shaohua could you please
take a look if it is correct and maybe post a new one?

Personally, I would think that TRIM support MD would be a very good thing.

Regards,
Holgerdiff -u --recursive --new-file linux-3.5.orig/drivers/md/linear.c 
linux-3.5/drivers/md/linear.c
--- linux-3.5.orig/drivers/md/linear.c  2012-07-21 20:58:29.0 +
+++ linux-3.5/drivers/md/linear.c   2012-07-27 06:53:39.507121434 +
@@ -138,6 +138,7 @@
struct linear_conf *conf;
struct md_rdev *rdev;
int i, cnt;
+   bool discard_supported = false;
 
conf = kzalloc (sizeof (*conf) + raid_disks*sizeof(struct dev_info),
GFP_KERNEL);
@@ -171,6 +172,8 @@
conf-array_sectors += rdev-sectors;
cnt++;
 
+   if (blk_queue_discard(bdev_get_queue(rdev-bdev)))
+   discard_supported = true;
}
if (cnt != raid_disks) {
printk(KERN_ERR md/linear:%s: not enough drives present. 
Aborting!\n,
@@ -178,6 +181,11 @@
goto out;
}
 
+   if (!discard_supported)
+   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev-queue);
+   else
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev-queue);
+
/*
 * Here we calculate the device offsets.
 */
@@ -326,6 +334,14 @@
bio-bi_sector = bio-bi_sector - start_sector
+ tmp_dev-rdev-data_offset;
rcu_read_unlock();
+
+   if (unlikely((bio-bi_rw  REQ_DISCARD) 
+   !blk_queue_discard(bdev_get_queue(bio-bi_bdev {
+   /* Just ignore it */
+   bio_endio(bio, 0);
+   return;
+   }
+
generic_make_request(bio);
 }
 
diff -u --recursive --new-file linux-3.5.orig/drivers/md/raid0.c 
linux-3.5/drivers/md/raid0.c
--- linux-3.5.orig/drivers/md/raid0.c   2012-07-21 20:58:29.0 +
+++ linux-3.5/drivers/md/raid0.c2012-07-27 06:53:39.507121434 +
@@ -88,6 +88,7 @@
char b[BDEVNAME_SIZE];
char b2[BDEVNAME_SIZE];
struct r0conf *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+   bool discard_supported = false;
 
if (!conf)
return -ENOMEM;
@@ -195,6 +196,9 @@
if (!smallest || (rdev1-sectors  smallest-sectors))
smallest = rdev1;
cnt++;
+
+   if (blk_queue_discard(bdev_get_queue(rdev1-bdev)))
+   discard_supported = true;
}
if (cnt != mddev-raid_disks) {
printk(KERN_ERR md/raid0:%s: too few disks (%d of %d) - 
@@ -272,6 +276,11 @@
blk_queue_io_opt(mddev-queue,
 (mddev-chunk_sectors  9) * mddev-raid_disks);
 
+   if (!discard_supported)
+   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev-queue);
+   else
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev-queue);
+
pr_debug(md/raid0:%s: done.\n, mdname(mddev));
*private_conf = conf;
 
@@ -422,6 +431,7 @@
if (md_check_no_bitmap(mddev))
return -EINVAL;
blk_queue_max_hw_sectors(mddev-queue, mddev-chunk_sectors);
+   blk_queue_max_discard_sectors(mddev-queue, mddev-chunk_sectors);
 
/* if private is not null, we are here after takeover */
if (mddev-private == NULL) {
@@ -509,7 +519,7 @@
sector_t sector = bio-bi_sector;
struct bio_pair *bp;
/* Sanity check -- queue functions should prevent this 
happening */
-   if (bio-bi_vcnt != 1 ||
+   if ((bio-bi_vcnt != 1  bio-bi_vcnt != 0) ||
bio-bi_idx != 0)
goto bad_map;
/* This is a one page bio that upper layers
@@ -535,6 +545,13 @@
bio-bi_sector = sector_offset + zone-dev_start +
tmp_dev-data_offset;
 
+   if (unlikely((bio-bi_rw  REQ_DISCARD) 
+   !blk_queue_discard(bdev_get_queue(bio-bi_bdev {
+   /* Just ignore it */
+   bio_endio(bio, 0);
+   return;
+   }
+
generic_make_request(bio);
return;
 
diff -u --recursive --new-file linux-3.5.orig/drivers/md/raid10.c 
linux-3.5/drivers/md/raid10.c
--- linux-3.5.orig/drivers/md/raid10.c  2012-07-21 20:58:29.0 +
+++ linux-3.5/drivers/md/raid10.c   2012-07-27 06:53:39.507121435 +
@@ -887,7 +887,12 @@

1 2 >

100 matches

Mail list logo