Re: 3.12-rc7 regression - network panic from ipv6

2013-10-30 Thread mroos
> Signed-off-by: Steffen Klassert 

Works fine, thanks!

Tested-by: Meelis Roos 

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bisected regression: 3c59x corrupts packets in 3.17-rc5

2014-09-17 Thread mroos
> Shit, you're right, sorry about that.  Its odd, I'm running it here, and its 
> not 
> causing problems, but thats obviously wrong.  Meelis, please add the above fix
> to your test and confirm that it sovles the problem.  If you could keep the
> previous patch in place too that would be great, as we should probably add the
> dma error checking anyway.
> 
> 
> [PATCH] 3c59x: Fix bad offset spec in skb_frag_dma_map

Tested 2 variants: only this patch (backported to old state) and both 
patches together.

Both work fine.

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Fix several sparc64 THP bugs.

2014-04-26 Thread mroos
> Meelis and Aaro, I've found and fixed several THP bugs for sparc64
> over the last week or so.
> 
> I cannot %100 account for the exit_mmap() WARN_ON that you two have
> been able to trigger, however I'd like you both to test the changes
> nonetheless.
> 
> They are against 3.15 but they should apply cleanly all the way back
> to 3.13
> 
> Thanks in advance for testing.

Tried it on Netra X1, Ultra 2 and V100 that were online (applied the 
patches and enabled THP with defaulting to always).

Ultra 2 did not boot up (will see on Monday).

Netra X1 performed a simple dist-upgrade fine with this kernel.

V100 boots up fine but as soon as I start aptitude, it just hangs with 
nothing on console (tried it twice).

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()

2014-04-16 Thread mroos
> > Just for the archives, I got one of these again with 3.14:
> 
> Meelis and Aaro, thanks again for all of your reports.
> 
> After pouring over a lot of the data and auditing some code I'm
> suspecting it's a problem with transparent huge pages.
> 
> One thing you two can do to help me further confirm this is to run
> with THP disabled for a while and see if you still get the log
> messages.

I have snice turned off CONFIG_TRANSPARENT_HUGEPAGE on 3 of 4 servers 
that had this problem (actually most of my sparc64 machines) and the 4th 
has

CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not se

and also has not had this problem since then. All 4 machines have been 
running through most -rc's of every kernel.

-- 
Meelis Roos 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64 WARNING: at mm/mmap.c:2757 exit_mmap+0x13c/0x160()

2014-05-24 Thread mroos
This is todays fresh git with 3.15.0-rc6-00190-g1ee1cea on V210, THP 
enabled & always on. Got this and a segfault on apt-spawned xz.

[  142.599575] [ cut here ]
[  142.660349] WARNING: CPU: 1 PID: 2237 at mm/mmap.c:2741 
exit_mmap+0x140/0x160()
[  142.756483] Modules linked in: ipv6 tg3 hwmon ptp pps_core
[  142.830269] CPU: 1 PID: 2237 Comm: aptitude Not tainted 
3.15.0-rc6-00190-g1ee1cea #93
[  142.933226] Call Trace:
[  142.965358]  [0045a12c] warn_slowpath_common+0x4c/0x80
[  143.042074]  [004e7a40] exit_mmap+0x140/0x160
[  143.108410]  [004586a0] mmput.part.60+0x20/0xe0
[  143.177030]  [0045af3c] exit_mm+0x11c/0x180
[  143.241071]  [0045c920] do_exit+0x240/0x340
[  143.305134]  [0045cb48] do_group_exit+0x28/0xc0
[  143.373753]  [00468ac8] get_signal_to_deliver+0x1c8/0x3a0
[  143.453819]  [00449334] do_signal32+0x14/0x220
[  143.521292]  [0042d0e0] do_signal+0x2c0/0x520
[  143.587624]  [0042db40] do_notify_resume+0x40/0x60
[  143.659683]  [00404b04] __handle_signal+0xc/0x2c
[  143.729448] ---[ end trace b34008751438e7e6 ]---
[  143.790182] BUG: Bad rss-counter state mm:fc000d9d3660 idx:1 val:1


-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: unaligned accesses in SLAB etc.

2014-10-13 Thread mroos
> From: David Miller 
> Date: Sat, 11 Oct 2014 22:15:10 -0400 (EDT)
> 
> > 
> > I'm getting tons of the following on sparc64:
> > 
> > [603965.383447] Kernel unaligned access at TPC[546b58] free_block+0x98/0x1a0
> > [603965.396987] Kernel unaligned access at TPC[546b60] free_block+0xa0/0x1a0
> > [603965.410523] Kernel unaligned access at TPC[546b58] free_block+0x98/0x1a0

> In all of the cases, the address is 4-byte aligned but not 8-byte
> aligned.  And they are vmalloc addresses.
> 
> Which made me suspect the percpu commit:
> 
> 
> commit bf0dea23a9c094ae869a88bb694fbe966671bf6d
> Author: Joonsoo Kim 
> Date:   Thu Oct 9 15:26:27 2014 -0700
> 
> mm/slab: use percpu allocator for cpu cache
> 
> 
> And indeed, reverting this commit fixes the problem.

I tested Joonsoo Kim's fix and it gets rid of the kernel unaligned 
access messages, yes.

But the instability on UltraSparc II era machines still remains - 
occassional Bus Errors during kernel compilation, messages like this:

sh[11771]: segfault at ffd6a4d1 ip f7cc5714 (rpc f7cc562c) sp 
ffd69d90 error 30002 in libc-2.19.so[f7c44000+16a000]

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: unaligned accesses in SLAB etc.

2014-10-14 Thread mroos
> > I'd like to know that your another problem is related to commit
> > bf0dea23a9c0 ("mm/slab: use percpu allocator for cpu cache").  So,
> > if the commit is reverted, your another problem is also gone
> > completely?
> 
> The other problem has been present forever.

Umm? I am afraid I have been describing it badly. This random 
SIGBUS+SIGSEGV problem is new - I have not seen it before.

I have been able to do kernel compiles for years on sparc64 (modulo 
specific bugs in specific configurations) and 3.17 + start/end swap 
patch seems also stable for most machine. With yesterdays git + align 
patch, it dies with SIGBUS multiple times during compilation so it's a 
new regression for me.

Will try reverting that commit tomorrow.

My only other current sparc64 problems that I am seeing - V210/V440 die 
during bootup if compiled with gcc 4.9 and V480 dies with FATAL 
exceptions during bootups since previous kernel release. Maybe also 
exit_mmap warning - I do not know if they have been fixed, I see them 
rarely.

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.6-rc*: Kernel unaligned access at pci_bus_read_config_dword+0x64/0x80

2016-06-19 Thread mroos
> >> bisected to git commit 63e3027
> > 
> > That's a merge commit that adds 100 different commits, and this happened
> > way back in March.
> > 
> > Please find the exact dword access in:
> > 
> > drivers/pci/pcie/portdrv_{core,pci,bus}.c
> > 
> > that triggers this unaligned access so we can debug this further.
> 
> Meanwhile I figured it out, this should fix the bug:

Yes it does for me too - thank you!

-- 
Meelis Roos (mr...@linux.ee)


Re: [PATCH v2] rhashtable: fix for resize events during table walk

2015-07-14 Thread mroos
> If rhashtable_walk_next detects a resize operation in progress, it jumps
> to the new table and continues walking that one. But it misses to drop
> the reference to it's current item, leading it to continue traversing
> the new table's bucket in which the current item is sorted into, and
> after reaching that bucket's end continues traversing the new table's
> second bucket instead of the first one, thereby potentially missing
> items.
> 
> This fixes the rhashtable runtime test for me. Bug probably introduced
> by Herbert Xu's patch eddee5ba ("rhashtable: Fix walker behaviour during
> rehash") although not explicitly tested.
> 
> Fixes: eddee5ba ("rhashtable: Fix walker behaviour during rehash")
> Signed-off-by: Phil Sutter 

Yes, this fixes the error, thank you.

The new problem with the test - soft lockup - CPU#0 stuck for 22s! is 
still there on 360 MHz UltraSparc IIi. I understand it is harmless but 
is there some easy way to make the test avoid NMI watchdog?

[   58.374173] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper:1]
[   58.374293] Modules linked in:
[   58.374387] irq event stamp: 144
[   58.374461] hardirqs last  enabled at (143): [<00404b1c>] 
rtrap_xcall+0x18/0x20
[   58.374621] hardirqs last disabled at (144): [<00426b28>] 
sys_call_table+0x5ac/0x744
[   58.374788] softirqs last  enabled at (142): [<0045f5fc>] 
__do_softirq+0x4fc/0x680
[   58.374958] softirqs last disabled at (135): [<0042be28>] 
do_softirq_own_stack+0x28/0x40
[   58.375148] CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc2-00077-gf760b87 
#20
[   58.375248] task: f8001f09ef60 ti: f8001f0fc000 task.ti: 
f8001f0fc000
[   58.375348] TSTATE: 004480001601 TPC: 0049663c TNPC: 
00496640 Y: Not tainted
[   58.375497] TPC: 
[   58.375579] g0: 00b1d000 g1: 0002 g2: 00a88000 
g3: 007f
[   58.375699] g4: f8001f09ef60 g5:  g6: f8001f0fc000 
g7: 2f23003c7b80
[   58.375817] o0:  o1: 0002 o2: 0620 
o3: f8001f09f560
[   58.375937] o4: f8001f09ef60 o5: 0002 sp: f8001f0ff041 
ret_pc: 0049662c
[   58.376069] RPC: 
[   58.376152] l0: f8001f09ef60 l1: 0189bc00 l2: 00b1d000 
l3: 0028
[   58.376272] l4: f8001f09f538 l5: 0008 l6:  
l7: 0014
[   58.376388] i0: 018d1428 i1: 01318d18 i2: 05f8 
i3: 
[   58.376506] i4:  i5: 0001 i6: f8001f0ff0f1 
i7: 007022d8
[   58.376643] I7: 
[   58.376715] Call Trace:
[   58.376798]  [007022d8] lockdep_rht_mutex_is_held+0x18/0x40
[   58.376917]  [00b7a6ac] test_rht_lookup.constprop.10+0x32c/0x4ac
[   58.377029]  [00b7afd0] test_rhashtable.constprop.8+0x7a4/0x1100
[   58.377138]  [00b7ba00] test_rht_init+0xd4/0x148
[   58.377240]  [00426e2c] do_one_initcall+0xec/0x1e0
[   58.377351]  [00b58b60] kernel_init_freeable+0x114/0x1c4
[   58.377469]  [0091c1ec] kernel_init+0xc/0x100
[   58.377577]  [00405fe4] ret_from_fork+0x1c/0x2c
[   58.377663]  []   (null)


-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread mroos
> I think with this patch from -rc6 the symptoms should be cured:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7
> 
> if that theory is right.

The result with 4.13-rc6 is positive but mixed: the message about MSI-X 
affinty maks are still there but the rest of the detection works and the 
driver is loaded successfully:

[   29.924282] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 
10.00.00.00-k.
[   29.924710] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 iobase 
0x00c100d0.
[   29.925581] qla2xxx :10:00.0: can't allocate MSI-X affinity masks for 2 
vectors
[   30.483422] scsi host1: qla2xxx
[   35.495031] qla2xxx [:10:00.0]-00fb:1: QLogic QLE2462 - 
SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H.
[   35.495274] qla2xxx [:10:00.0]-00fc:1: ISP2432: PCIe (2.5GT/s x4) @ 
:10:00.0 hdma- host#=1 fw=7.03.00 (9496).
[   35.495615] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 iobase 
0x00c100d04000.
[   35.496409] qla2xxx :10:00.1: can't allocate MSI-X affinity masks for 2 
vectors
[   35.985355] scsi host2: qla2xxx
[   40.996991] qla2xxx [:10:00.1]-00fb:2: QLogic QLE2462 - 
SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H.
[   40.997251] qla2xxx [:10:00.1]-00fc:2: ISP2432: PCIe (2.5GT/s x4) @ 
:10:00.1 hdma- host#=2 fw=7.03.00 (9496).
[   51.880945] qla2xxx [:10:00.0]-8038:1: Cable is unplugged...
[   57.402900] qla2xxx [:10:00.1]-8038:2: Cable is unplugged...

With Dave Millers patch on top of 4.13-rc6, I see the following before 
both MSI-X messages:

irq_create_affinity_masks: nvecs[2] affd->pre_vectors[2] affd->post_vectors[0]

-- 
Meelis Roos (mr...@linux.ee)