Re: [4.11-rc1 block]: Boot failure due to memory corruption.

2017-03-09 Thread Jan Kara
Hello,


On Thu 09-03-17 15:25:23, Tetsuo Handa wrote:
> I noticed that 4.11-rc1 crashes upon boot.

Thanks for report. It should be fixed by patches I've posted yesterday:

https://www.mail-archive.com/linux-block@vger.kernel.org/msg05566.html

Honza


> 
> 
> [5.358848] Fusion MPT base driver 3.04.20
> [5.360468] Copyright (c) 1999-2008 LSI Corporation
> [5.377993] e1000: Intel(R) PRO/1000 Network Driver - version 
> 7.3.21-k8-NAPI
> [5.380697] e1000: Copyright (c) 1999-2006 Intel Corporation.
> [5.383907] Fusion MPT SPI Host driver 3.04.20
> [5.384772] scsi host0: ata_piix
> [5.389732] scsi host1: ata_piix
> [5.391435] ata1: PATA max UDMA/33 cmd 0x1f0 ctl 0x3f6 bmdma 0x1060 irq 14
> [5.400844] ata2: PATA max UDMA/33 cmd 0x170 ctl 0x376 bmdma 0x1068 irq 15
> [5.408349] mptbase: ioc0: Initiating bringup
> [5.446375] ioc0: LSI53C1030 B0: Capabilities={Initiator}
> [5.535645] scsi host2: ioc0: LSI53C1030 B0, FwRev=01032920h, Ports=1, 
> MaxQ=128, IRQ=17
> [5.575721] ata2.00: ATAPI: VMware Virtual IDE CDROM Drive, 0001, max 
> UDMA/33
> [5.583595] ata2.00: configured for UDMA/33
> [5.592101] scsi 2:0:0:0: Direct-Access VMware,  VMware Virtual S 1.0  
> PQ: 0 ANSI: 2
> [5.595206] scsi target2:0:0: Beginning Domain Validation
> [5.598405] scsi 1:0:0:0: CD-ROMNECVMWar VMware IDE CDR10 1.00 
> PQ: 0 ANSI: 5
> [5.605254] [drm] DMA map mode: Using physical TTM page addresses.
> [5.605571] scsi target2:0:0: Domain Validation skipping write tests
> [5.605573] scsi target2:0:0: Ending Domain Validation
> [5.605696] scsi target2:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, 
> offset 127)
> [5.625342] BUG: unable to handle kernel paging request at 88006b443e58
> [5.625349] IP: rb_erase+0x301/0x350
> [5.625350] PGD 31ae067 
> [5.625350] PUD 31b1067 
> [5.625351] PMD 7fda6067 
> [5.625351] PTE 80006b443060
> [5.625352] 
> [5.625353] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
> [5.625354] Modules linked in: serio_raw vmwgfx(+) drm_kms_helper 
> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mptspi drm ata_piix 
> scsi_transport_spi mptscsih e1000(+) i2c_core mptbase libata
> [5.625366] CPU: 3 PID: 9 Comm: rcuos/0 Not tainted 4.11.0-rc1 #112
> [5.625367] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
> Desktop Reference Platform, BIOS 6.00 07/31/2013
> [5.625368] task: 880074970240 task.stack: c936c000
> [5.625369] RIP: 0010:rb_erase+0x301/0x350
> [5.625370] RSP: 0018:c936fe18 EFLAGS: 00010046
> [5.625371] RAX:  RBX: 88006ae742c0 RCX: 
> 
> [5.625372] RDX: 0001 RSI: 88006b443e58 RDI: 
> 88006ae742e0
> [5.625372] RBP: c936fe18 R08:  R09: 
> 0001
> [5.625373] R10:  R11:  R12: 
> 0246
> [5.625373] R13: 88006b440800 R14: 81359c30 R15: 
> 88006b7c77a0
> [5.625374] FS:  () GS:88007580() 
> knlGS:
> [5.625375] CS:  0010 DS:  ES:  CR0: 80050033
> [5.625376] CR2: 88006b443e58 CR3: 6a6c7000 CR4: 
> 001406e0
> [5.625408] Call Trace:
> [5.625411]  wb_congested_put+0x6a/0xb0
> [5.625414]  __blkg_release_rcu+0xe3/0x1c0
> [5.625418]  rcu_nocb_kthread+0x1c8/0x570
> [5.625419]  ? rcu_nocb_kthread+0xf8/0x570
> [5.625424]  kthread+0x10a/0x140
> [5.625425]  ? rcu_all_qs+0x90/0x90
> [5.625427]  ? kthread_create_on_node+0x60/0x60
> [5.625429]  ret_from_fork+0x31/0x40
> [5.625432] Code: 01 4c 89 07 49 89 c8 eb 9a 48 89 16 48 89 fa e9 e7 fe ff 
> ff 48 89 51 10 48 89 fa e9 db fe ff ff 4c 89 06 5d c3 4c 89 42 10 5d c3 <4c> 
> 89 06 e9 b1 fd ff ff 83 e2 01 0f 85 e1 fd ff ff 5d c3 4c 89 
> [5.625453] RIP: rb_erase+0x301/0x350 RSP: c936fe18
> [5.625453] CR2: 88006b443e58
> [5.625502] ---[ end trace 0415a22853a9a611 ]---
> [5.625503] Kernel panic - not syncing: Fatal exception in interrupt
> [6.708465] Shutting down cpus with NMI
> [6.708610] Kernel Offset: disabled
> [6.789338] ---[ end Kernel panic - not syncing: Fatal exception in 
> interrupt
> 
> 
> Judging from the timing, I suspect that this problem is caused by memory 
> corruption
> caused by commit 165a5e22fafb127 ("block: Move bdi_unregister() to 
> del_gendisk()").
> 
> 
> # bad: [a3b4924b027f9a4b95ce89a914c1e0459e76f18a] Merge tag 'scsi-misc' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> # good: [15dd03811d99dcf828f4eeb2c2b6a02ddc1201c7] scsi: megaraid_sas: NVME 
> Interface detection and prop settings
> # good: [857de6e00778738dc3d61f75acbac35bdc48e533] 

[4.11-rc1 block]: Boot failure due to memory corruption.

2017-03-08 Thread Tetsuo Handa
Hello.

I noticed that 4.11-rc1 crashes upon boot.


[5.358848] Fusion MPT base driver 3.04.20
[5.360468] Copyright (c) 1999-2008 LSI Corporation
[5.377993] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[5.380697] e1000: Copyright (c) 1999-2006 Intel Corporation.
[5.383907] Fusion MPT SPI Host driver 3.04.20
[5.384772] scsi host0: ata_piix
[5.389732] scsi host1: ata_piix
[5.391435] ata1: PATA max UDMA/33 cmd 0x1f0 ctl 0x3f6 bmdma 0x1060 irq 14
[5.400844] ata2: PATA max UDMA/33 cmd 0x170 ctl 0x376 bmdma 0x1068 irq 15
[5.408349] mptbase: ioc0: Initiating bringup
[5.446375] ioc0: LSI53C1030 B0: Capabilities={Initiator}
[5.535645] scsi host2: ioc0: LSI53C1030 B0, FwRev=01032920h, Ports=1, 
MaxQ=128, IRQ=17
[5.575721] ata2.00: ATAPI: VMware Virtual IDE CDROM Drive, 0001, max 
UDMA/33
[5.583595] ata2.00: configured for UDMA/33
[5.592101] scsi 2:0:0:0: Direct-Access VMware,  VMware Virtual S 1.0  
PQ: 0 ANSI: 2
[5.595206] scsi target2:0:0: Beginning Domain Validation
[5.598405] scsi 1:0:0:0: CD-ROMNECVMWar VMware IDE CDR10 1.00 
PQ: 0 ANSI: 5
[5.605254] [drm] DMA map mode: Using physical TTM page addresses.
[5.605571] scsi target2:0:0: Domain Validation skipping write tests
[5.605573] scsi target2:0:0: Ending Domain Validation
[5.605696] scsi target2:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 
127)
[5.625342] BUG: unable to handle kernel paging request at 88006b443e58
[5.625349] IP: rb_erase+0x301/0x350
[5.625350] PGD 31ae067 
[5.625350] PUD 31b1067 
[5.625351] PMD 7fda6067 
[5.625351] PTE 80006b443060
[5.625352] 
[5.625353] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[5.625354] Modules linked in: serio_raw vmwgfx(+) drm_kms_helper 
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mptspi drm ata_piix 
scsi_transport_spi mptscsih e1000(+) i2c_core mptbase libata
[5.625366] CPU: 3 PID: 9 Comm: rcuos/0 Not tainted 4.11.0-rc1 #112
[5.625367] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 07/31/2013
[5.625368] task: 880074970240 task.stack: c936c000
[5.625369] RIP: 0010:rb_erase+0x301/0x350
[5.625370] RSP: 0018:c936fe18 EFLAGS: 00010046
[5.625371] RAX:  RBX: 88006ae742c0 RCX: 
[5.625372] RDX: 0001 RSI: 88006b443e58 RDI: 88006ae742e0
[5.625372] RBP: c936fe18 R08:  R09: 0001
[5.625373] R10:  R11:  R12: 0246
[5.625373] R13: 88006b440800 R14: 81359c30 R15: 88006b7c77a0
[5.625374] FS:  () GS:88007580() 
knlGS:
[5.625375] CS:  0010 DS:  ES:  CR0: 80050033
[5.625376] CR2: 88006b443e58 CR3: 6a6c7000 CR4: 001406e0
[5.625408] Call Trace:
[5.625411]  wb_congested_put+0x6a/0xb0
[5.625414]  __blkg_release_rcu+0xe3/0x1c0
[5.625418]  rcu_nocb_kthread+0x1c8/0x570
[5.625419]  ? rcu_nocb_kthread+0xf8/0x570
[5.625424]  kthread+0x10a/0x140
[5.625425]  ? rcu_all_qs+0x90/0x90
[5.625427]  ? kthread_create_on_node+0x60/0x60
[5.625429]  ret_from_fork+0x31/0x40
[5.625432] Code: 01 4c 89 07 49 89 c8 eb 9a 48 89 16 48 89 fa e9 e7 fe ff 
ff 48 89 51 10 48 89 fa e9 db fe ff ff 4c 89 06 5d c3 4c 89 42 10 5d c3 <4c> 89 
06 e9 b1 fd ff ff 83 e2 01 0f 85 e1 fd ff ff 5d c3 4c 89 
[5.625453] RIP: rb_erase+0x301/0x350 RSP: c936fe18
[5.625453] CR2: 88006b443e58
[5.625502] ---[ end trace 0415a22853a9a611 ]---
[5.625503] Kernel panic - not syncing: Fatal exception in interrupt
[6.708465] Shutting down cpus with NMI
[6.708610] Kernel Offset: disabled
[6.789338] ---[ end Kernel panic - not syncing: Fatal exception in interrupt


Judging from the timing, I suspect that this problem is caused by memory 
corruption
caused by commit 165a5e22fafb127 ("block: Move bdi_unregister() to 
del_gendisk()").


# bad: [a3b4924b027f9a4b95ce89a914c1e0459e76f18a] Merge tag 'scsi-misc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
# good: [15dd03811d99dcf828f4eeb2c2b6a02ddc1201c7] scsi: megaraid_sas: NVME 
Interface detection and prop settings
# good: [857de6e00778738dc3d61f75acbac35bdc48e533] scsi: use 
'scsi_device_from_queue()' for scsi_dh
# good: [825c6abbc316f496cd2b66e1fa72892cf4b49a9f] scsi: lpfc: use proper 
format string for dma_addr_t
# good: [54d7989f476ca57fc3c5cc71524c480ccb74c481] Merge tag 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
# good: [41dc529a4602ac737020f423f84686a81de38e6d] qla2xxx: Improve RSCN 
handling in driver
# good: [821fd6f6cb6500cd04a6c7e8f701f9b311a5c2b3] Merge branch 'for-next' of