Re: 3.12-rc7 regression - network panic from ipv6
> Signed-off-by: Steffen Klassert Works fine, thanks! Tested-by: Meelis Roos -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bisected regression: 3c59x corrupts packets in 3.17-rc5
> Shit, you're right, sorry about that. Its odd, I'm running it here, and its > not > causing problems, but thats obviously wrong. Meelis, please add the above fix > to your test and confirm that it sovles the problem. If you could keep the > previous patch in place too that would be great, as we should probably add the > dma error checking anyway. > > > [PATCH] 3c59x: Fix bad offset spec in skb_frag_dma_map Tested 2 variants: only this patch (backported to old state) and both patches together. Both work fine. -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Fix several sparc64 THP bugs.
> Meelis and Aaro, I've found and fixed several THP bugs for sparc64 > over the last week or so. > > I cannot %100 account for the exit_mmap() WARN_ON that you two have > been able to trigger, however I'd like you both to test the changes > nonetheless. > > They are against 3.15 but they should apply cleanly all the way back > to 3.13 > > Thanks in advance for testing. Tried it on Netra X1, Ultra 2 and V100 that were online (applied the patches and enabled THP with defaulting to always). Ultra 2 did not boot up (will see on Monday). Netra X1 performed a simple dist-upgrade fine with this kernel. V100 boots up fine but as soon as I start aptitude, it just hangs with nothing on console (tried it twice). -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
> > Just for the archives, I got one of these again with 3.14: > > Meelis and Aaro, thanks again for all of your reports. > > After pouring over a lot of the data and auditing some code I'm > suspecting it's a problem with transparent huge pages. > > One thing you two can do to help me further confirm this is to run > with THP disabled for a while and see if you still get the log > messages. I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers that had this problem (actually most of my sparc64 machines) and the 4th has CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE=y # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not se and also has not had this problem since then. All 4 machines have been running through most -rc's of every kernel. -- Meelis Roos -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()
This is todays fresh git with 3.15.0-rc6-00190-g1ee1cea on V210, THP enabled & always on. Got this and a segfault on apt-spawned xz. [ 142.599575] [ cut here ] [ 142.660349] WARNING: CPU: 1 PID: 2237 at mm/mmap.c:2741 exit_mmap+0x140/0x160() [ 142.756483] Modules linked in: ipv6 tg3 hwmon ptp pps_core [ 142.830269] CPU: 1 PID: 2237 Comm: aptitude Not tainted 3.15.0-rc6-00190-g1ee1cea #93 [ 142.933226] Call Trace: [ 142.965358] [0045a12c] warn_slowpath_common+0x4c/0x80 [ 143.042074] [004e7a40] exit_mmap+0x140/0x160 [ 143.108410] [004586a0] mmput.part.60+0x20/0xe0 [ 143.177030] [0045af3c] exit_mm+0x11c/0x180 [ 143.241071] [0045c920] do_exit+0x240/0x340 [ 143.305134] [0045cb48] do_group_exit+0x28/0xc0 [ 143.373753] [00468ac8] get_signal_to_deliver+0x1c8/0x3a0 [ 143.453819] [00449334] do_signal32+0x14/0x220 [ 143.521292] [0042d0e0] do_signal+0x2c0/0x520 [ 143.587624] [0042db40] do_notify_resume+0x40/0x60 [ 143.659683] [00404b04] __handle_signal+0xc/0x2c [ 143.729448] ---[ end trace b34008751438e7e6 ]--- [ 143.790182] BUG: Bad rss-counter state mm:fc000d9d3660 idx:1 val:1 -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unaligned accesses in SLAB etc.
> From: David Miller > Date: Sat, 11 Oct 2014 22:15:10 -0400 (EDT) > > > > > I'm getting tons of the following on sparc64: > > > > [603965.383447] Kernel unaligned access at TPC[546b58] free_block+0x98/0x1a0 > > [603965.396987] Kernel unaligned access at TPC[546b60] free_block+0xa0/0x1a0 > > [603965.410523] Kernel unaligned access at TPC[546b58] free_block+0x98/0x1a0 > In all of the cases, the address is 4-byte aligned but not 8-byte > aligned. And they are vmalloc addresses. > > Which made me suspect the percpu commit: > > > commit bf0dea23a9c094ae869a88bb694fbe966671bf6d > Author: Joonsoo Kim > Date: Thu Oct 9 15:26:27 2014 -0700 > > mm/slab: use percpu allocator for cpu cache > > > And indeed, reverting this commit fixes the problem. I tested Joonsoo Kim's fix and it gets rid of the kernel unaligned access messages, yes. But the instability on UltraSparc II era machines still remains - occassional Bus Errors during kernel compilation, messages like this: sh[11771]: segfault at ffd6a4d1 ip f7cc5714 (rpc f7cc562c) sp ffd69d90 error 30002 in libc-2.19.so[f7c44000+16a000] -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: unaligned accesses in SLAB etc.
> > I'd like to know that your another problem is related to commit > > bf0dea23a9c0 ("mm/slab: use percpu allocator for cpu cache"). So, > > if the commit is reverted, your another problem is also gone > > completely? > > The other problem has been present forever. Umm? I am afraid I have been describing it badly. This random SIGBUS+SIGSEGV problem is new - I have not seen it before. I have been able to do kernel compiles for years on sparc64 (modulo specific bugs in specific configurations) and 3.17 + start/end swap patch seems also stable for most machine. With yesterdays git + align patch, it dies with SIGBUS multiple times during compilation so it's a new regression for me. Will try reverting that commit tomorrow. My only other current sparc64 problems that I am seeing - V210/V440 die during bootup if compiled with gcc 4.9 and V480 dies with FATAL exceptions during bootups since previous kernel release. Maybe also exit_mmap warning - I do not know if they have been fixed, I see them rarely. -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 4.6-rc*: Kernel unaligned access at pci_bus_read_config_dword+0x64/0x80
> >> bisected to git commit 63e3027 > > > > That's a merge commit that adds 100 different commits, and this happened > > way back in March. > > > > Please find the exact dword access in: > > > > drivers/pci/pcie/portdrv_{core,pci,bus}.c > > > > that triggers this unaligned access so we can debug this further. > > Meanwhile I figured it out, this should fix the bug: Yes it does for me too - thank you! -- Meelis Roos (mr...@linux.ee)
Re: [PATCH v2] rhashtable: fix for resize events during table walk
> If rhashtable_walk_next detects a resize operation in progress, it jumps > to the new table and continues walking that one. But it misses to drop > the reference to it's current item, leading it to continue traversing > the new table's bucket in which the current item is sorted into, and > after reaching that bucket's end continues traversing the new table's > second bucket instead of the first one, thereby potentially missing > items. > > This fixes the rhashtable runtime test for me. Bug probably introduced > by Herbert Xu's patch eddee5ba ("rhashtable: Fix walker behaviour during > rehash") although not explicitly tested. > > Fixes: eddee5ba ("rhashtable: Fix walker behaviour during rehash") > Signed-off-by: Phil Sutter Yes, this fixes the error, thank you. The new problem with the test - soft lockup - CPU#0 stuck for 22s! is still there on 360 MHz UltraSparc IIi. I understand it is harmless but is there some easy way to make the test avoid NMI watchdog? [ 58.374173] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1] [ 58.374293] Modules linked in: [ 58.374387] irq event stamp: 144 [ 58.374461] hardirqs last enabled at (143): [<00404b1c>] rtrap_xcall+0x18/0x20 [ 58.374621] hardirqs last disabled at (144): [<00426b28>] sys_call_table+0x5ac/0x744 [ 58.374788] softirqs last enabled at (142): [<0045f5fc>] __do_softirq+0x4fc/0x680 [ 58.374958] softirqs last disabled at (135): [<0042be28>] do_softirq_own_stack+0x28/0x40 [ 58.375148] CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc2-00077-gf760b87 #20 [ 58.375248] task: f8001f09ef60 ti: f8001f0fc000 task.ti: f8001f0fc000 [ 58.375348] TSTATE: 004480001601 TPC: 0049663c TNPC: 00496640 Y: Not tainted [ 58.375497] TPC: [ 58.375579] g0: 00b1d000 g1: 0002 g2: 00a88000 g3: 007f [ 58.375699] g4: f8001f09ef60 g5: g6: f8001f0fc000 g7: 2f23003c7b80 [ 58.375817] o0: o1: 0002 o2: 0620 o3: f8001f09f560 [ 58.375937] o4: f8001f09ef60 o5: 0002 sp: f8001f0ff041 ret_pc: 0049662c [ 58.376069] RPC: [ 58.376152] l0: f8001f09ef60 l1: 0189bc00 l2: 00b1d000 l3: 0028 [ 58.376272] l4: f8001f09f538 l5: 0008 l6: l7: 0014 [ 58.376388] i0: 018d1428 i1: 01318d18 i2: 05f8 i3: [ 58.376506] i4: i5: 0001 i6: f8001f0ff0f1 i7: 007022d8 [ 58.376643] I7: [ 58.376715] Call Trace: [ 58.376798] [007022d8] lockdep_rht_mutex_is_held+0x18/0x40 [ 58.376917] [00b7a6ac] test_rht_lookup.constprop.10+0x32c/0x4ac [ 58.377029] [00b7afd0] test_rhashtable.constprop.8+0x7a4/0x1100 [ 58.377138] [00b7ba00] test_rht_init+0xd4/0x148 [ 58.377240] [00426e2c] do_one_initcall+0xec/0x1e0 [ 58.377351] [00b58b60] kernel_init_freeable+0x114/0x1c4 [ 58.377469] [0091c1ec] kernel_init+0xc/0x100 [ 58.377577] [00405fe4] ret_from_fork+0x1c/0x2c [ 58.377663] [] (null) -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors
> I think with this patch from -rc6 the symptoms should be cured: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7 > > if that theory is right. The result with 4.13-rc6 is positive but mixed: the message about MSI-X affinty maks are still there but the rest of the detection works and the driver is loaded successfully: [ 29.924282] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.00.00.00-k. [ 29.924710] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 iobase 0x00c100d0. [ 29.925581] qla2xxx :10:00.0: can't allocate MSI-X affinity masks for 2 vectors [ 30.483422] scsi host1: qla2xxx [ 35.495031] qla2xxx [:10:00.0]-00fb:1: QLogic QLE2462 - SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H. [ 35.495274] qla2xxx [:10:00.0]-00fc:1: ISP2432: PCIe (2.5GT/s x4) @ :10:00.0 hdma- host#=1 fw=7.03.00 (9496). [ 35.495615] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 iobase 0x00c100d04000. [ 35.496409] qla2xxx :10:00.1: can't allocate MSI-X affinity masks for 2 vectors [ 35.985355] scsi host2: qla2xxx [ 40.996991] qla2xxx [:10:00.1]-00fb:2: QLogic QLE2462 - SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H. [ 40.997251] qla2xxx [:10:00.1]-00fc:2: ISP2432: PCIe (2.5GT/s x4) @ :10:00.1 hdma- host#=2 fw=7.03.00 (9496). [ 51.880945] qla2xxx [:10:00.0]-8038:1: Cable is unplugged... [ 57.402900] qla2xxx [:10:00.1]-8038:2: Cable is unplugged... With Dave Millers patch on top of 4.13-rc6, I see the following before both MSI-X messages: irq_create_affinity_masks: nvecs[2] affd->pre_vectors[2] affd->post_vectors[0] -- Meelis Roos (mr...@linux.ee)