Memory corruption, if happens, IMHO shouldn't be a hardware-related - almost all of these boxes, except H61M-based box from 1st log, works for a long time with uptime more than year; and only software was changed on it; H61M-based box runs memtest86 for a tens of hours w/o any error. If it was caused by hardware - they should crash even earlier.

Rarely on different servers I saw 'zram decompression error' messages (in this case I've got such message on H61M-based box).

Also, other people that uses accel-ppp as BRAS software, have different kernel panics/bugs/oopses on fresh kernels.

I'll try to apply these patches, and I'll try to switch back to kernels that were stable on some boxes.

21.11.2015 01:13, Alexander Duyck пишет:
On 11/20/2015 05:58 AM, Andrew wrote:
Hi all.

Today some BRASes on 4.1.12 kernel were crashed.

Here's crash traces: http://pastebin.com/p68hNS8R
http://pastebin.com/36ieRAM2 http://pastebin.com/3BRTVEB6

On 3.2 kernel same hardware works OK, troubles were noticed after kernel
upgrade.

What additional info is needed?

Looking over the traces there seem to be two areas called out.

The first is the fib_trie resize BUG_ON that was triggered due to the parent and child not being associated. I think that might be due to memory corruption as I cannot find any spots where we are resizing without correctly setting up the parent-child relationship of the nodes first.

The other spot that is showing up is ppp_shutdown_interface and it's related path. It looks like there are a couple of patches you could try back-porting to see if it resolves the issue. If they do then perhaps they should be considered candidates for stable:

8cb775bc0a3 ("ppp: fix device unregistration upon netns deletion")
58a89ecaca5 ("ppp: fix lockdep splat in ppp_dev_uninit()")

- Alex

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to