[CentOS] vmcore on 5.4

2010-04-23 Thread My LinuxHAList
Information: 5.4 kernel (2.6.18-164.el5).

I have a vmcore (from kdump), if the developers are interested, let me know
a place to upload the vmcore file.

I used the crash command to do a backtrace.

I manage to get machines with later 5.4 and 5.5 to panic the same way.
Broadcom or Intel NICs panic the same way.

This is an NFS client where the NFS server is restarting several times;
NFSv3, mount it with defaults,noatime.
The client was busy writing things on NFS-mounted space while the NFS
servers was restarting several times.
So far, if I mount it with udp option, I've not managed to panic the
machines.
The bad news is that NFSv4 is strictly TCP, if I were to go down that route.

From the backtrace, it seems the crash is TCP-related.  I'll be trying
couple Linux TCP settings changes.
It's a possibility that the issues are with TCP in general (not NFS).
I would like to enlist community's help in further understanding this and
potential work-arounds with this TCP issues.

crash sys
  KERNEL: vmlinux
DUMPFILE: vmcore
CPUS: 4
DATE: Tue Apr 20 15:04:09 2010
  UPTIME: 18:55:25
LOAD AVERAGE: 0.13, 0.09, 0.03
   TASKS: 340
 RELEASE: 2.6.18-164.el5
 VERSION: #1 SMP Thu Sep 3 03:28:30 EDT 2009
 MACHINE: x86_64  (2660 Mhz)
  MEMORY: 23.6 GB
   PANIC: Oops:  [1] SMP  (check log for details)
crash bt -a
PID: 0  TASK: 802ffae0  CPU: 0   COMMAND: swapper
 #0 [8043ef20] crash_nmi_callback at 8007a3bf
 #1 [8043ef40] do_nmi at 8006585a
 #2 [8043ef50] nmi at 80064ebf
[exception RIP: acpi_processor_idle+579]
RIP: 8019765e  RSP: 803f1f48  RFLAGS: 0093
RAX: 0073111a  RBX: 0073111a  RCX: 0808
RDX: 0815  RSI: 0003  RDI: 
RBP: 81063e480100   R8: 803f   R9: 804b5e2c
R10: 0046  R11: 0046  R12: 
R13: 81063e48  R14:   R15: 
ORIG_RAX:   CS: 0010  SS: 0018
--- exception stack ---
 #3 [803f1f48] acpi_processor_idle at 8019765e
 #4 [803f1f90] cpu_idle at 8004939e
PID: 0  TASK: 810115f11100  CPU: 1   COMMAND: swapper
 #0 [810115f38f20] crash_nmi_callback at 8007a3bf
 #1 [810115f38f40] do_nmi at 8006585a
 #2 [810115f38f50] nmi at 80064ebf
[exception RIP: acpi_processor_idle+579]
RIP: 8019765e  RSP: 810115f2fea8  RFLAGS: 0093
RAX: 00731145  RBX: 00731145  RCX: 0808
RDX: 0815  RSI: 0003  RDI: 
RBP: 81063f173900   R8: 810115f2e000   R9: 804b5e2c
R10: 0046  R11: 0046  R12: 00ff
R13: 81063f173800  R14: 0100  R15: 803ea280
ORIG_RAX:   CS: 0010  SS: 0018
--- exception stack ---
 #3 [810115f2fea8] acpi_processor_idle at 8019765e
 #4 [810115f2fef0] cpu_idle at 8004939e
PID: 0  TASK: 810115f20080  CPU: 2   COMMAND: swapper
 #0 [810115f6bbc0] crash_kexec at 800ac5b9
 #1 [810115f6bc80] __die at 80065127
 #2 [810115f6bcc0] do_page_fault at 80066da7
 #3 [810115f6bdb0] error_exit at 8005dde9
[exception RIP: pskb_copy+307]
RIP: 8022486b  RSP: 810115f6be60  RFLAGS: 00010282
RAX: 81062cd5f540  RBX: 81062cac3980  RCX: 81046fb1e550
RDX:   RSI: 81062cd5f550  RDI: 0004
RBP: 810466f54a80   R8: 081f02b4   R9: 
R10: 81062cac3980  R11: 00c8  R12: 0220
R13: 810466f54a80  R14: 0002  R15: 803ea2a0
ORIG_RAX:   CS: 0010  SS: 0018
 #4 [810115f6be78] tcp_transmit_skb at 800217b7
 #5 [810115f6bec8] tcp_retransmit_skb at 80250ccd
 #6 [810115f6bf08] tcp_write_timer at 80252652
 #7 [810115f6bf28] run_timer_softirq at 800968be
 #8 [810115f6bf58] __do_softirq at 8001235a
 #9 [810115f6bf88] call_softirq at 8005e2fc
#10 [810115f6bfa0] do_softirq at 8006cb14
#11 [810115f6bfb0] apic_timer_interrupt at 8005dc8e
--- IRQ stack ---
#12 [810115f67df8] apic_timer_interrupt at 8005dc8e
[exception RIP: acpi_processor_idle+628]
RIP: 8019768f  RSP: 810115f67ea8  RFLAGS: 0282
RAX: 810115f67fd8  RBX: 81063f173100  RCX: 80184973
RDX: 81063f173000  RSI: 0082  RDI: 804b5e2c
RBP: 810115f67ee8   R8: 810115f66000   R9: 810115f67ecc
R10: 0046  R11: 810115f67ee8  R12: 81063f6e1180
R13: 10008040  R14: 81063f6e1180  R15: 81063f6e1180
ORIG_RAX: ff10  CS: 0010  SS: 

Re: [CentOS] vmcore

2008-10-01 Thread NiftyClusters T Mitchell
On Tue, Sep 30, 2008 at 2:50 AM, Mag Gam [EMAIL PROTECTED] wrote:
 I would like to analyze a kernel vmcore. Are there any docs you can
 recommend for me to read to  understand the process?

The kernel is a massive C program.
Download the source that matches your kernel and make yourself a set of  tags
(ctags or gtags what ever you like).

When it blew up you should have some points in the stack with symbol names.
Walk through the call stack and look at the input and output of each
function with
the source in front of you.
See where that (and a good search on the web) takes you

   http://www.linuxsymposium.org/2006/kdump_slides.pdf

If you made any source changes look for errors in your code  ;-)

If you are running stock mainline code -- file a bug and if you find the source
of the bug first do an add to the report with what you  think the error is.


-- 
NiftyCluster
T o m   M i t c h e l l
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] vmcore

2008-09-30 Thread Mag Gam
I would like to analyze a kernel vmcore. Are there any docs you can
recommend for me to read to  understand the process?

TIA
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos