On 2015-06-30 22:33, Christian Ruppert wrote:
On 2015-06-30 21:20, Rustad, Mark D wrote:
Christian,

On Jun 30, 2015, at 1:58 AM, Christian Ruppert <id...@qasl.de> wrote:

bad news. It didn't work either. :(

That is too bad.

The system just did a reset tonight and there's nothing useful.
What I did was:
I removed the console= parameter and therefore I added your mentioned earlyprintk= I verified it's working by redirecting a "h" to the sysrq-trigger and that's all I got: [ 308.812492] SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z)
[4early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
...

So basically still nothing :/

Could you send the full log that was captured via the earlyprintk,
just in case I can notice something that is reported there.

See the attached log but it's basically just from the newly booted kernel


One mentioned netconsole but I doubt it will be any better if even console= or earlyprintk= didn't catch anything.

I agree. It is incredibly unlikely that netconsole can catch anything
that earlyprintk can't.

Do you have any more ideas by chance?

One thing that comes to mind is that some systems will automatically
reset what any unrecoverable hardware error occurs. I have had systems
set up that way in the past and when such an error occurs, an
immediate reset is the result. Have you noticed any BIOS settings
related to that? If so, could you change them to SMIs or something? Or
is there a different instance of that hardware that you can run this
on?

See below


In my last mail I summarised our setup and I'm willing to provide as much information as I can to get this solved but right now I have no more ideas.

I think detailed information on your hardware and BIOS settings, along
with whatever log you do get via earlyprintk might help. It may be
possible that a software error could trigger an uncorrectable error,
but it isn't real common. It sure doesn't behave like a typical kernel
panic kind of issue. Oh, and do check any error log that your BIOS
might be holding for you.

We tried Supermicro 5018D-MTF (E3-1281v3), 5017C-MTF (E3-1220L IIRC)
and a Workstation PC (i5-4460) with an Asus mainboard (H97M-E) and
it's the same everywhere. All Systems do have 32GB RAM, the two
Supermicro even ECC. And we only have issues in combination with the
mentioned X520 NIC AND the SYNPROXY iptables extension.
mcelog is empty. The 5018D-MTF Event log has nothing either. I checked
for watchdog related settings in the BIOS but that looked good so far.
Also causing a test kernel panic resulted in a proper dump as well as
a valid kernel dump file. I can check the BIOS tomorrow and/or even
make some pictures of each page/tab in case it might help.

So I've got some more. I attached a tarball that contains IPMI screenshots of any BIOS tab/page of one of those 5018D-MTF. It also contains a dmesg as well as a very verbose "lspci -nnvvvxxxx". By the way, did I mention that we're doing bonding/LACP? But that shouldn't matter as we only have those issues with x520 NICs AND SYNPROXY. We tried some different setups (just 1GE NICs, different mainboard, complete different hardware etc.) and it really seems to be related to those two parts.
Please let me know if you need any more information.




--
Mark Rustad, Networking Division, Intel Corporation

--
Regards,
Christian Ruppert
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to