Hal Rosenstock wrote:
Or perhaps something crashed and didn't clean up properly. Does this occur
immediately after a boot ?
After a fresh reboot of the machines on the switch, I get the log at
http://www.cs.rutgers.edu/~bohra/osm-v2.log
The opensm process does not crash but hangs. The state of the port never
changes.
Now there is an OOPS in the dmesg :
ct 28 13:52:13 hora-3 OpenSM[5168]: OpenSM Rev:openib-1.1.0
Oct 28 13:52:14 hora-3 kernel: Unable to handle kernel paging request at
virtual address 09000010
Oct 28 13:52:14 hora-3 kernel: printing eip:
Oct 28 13:52:14 hora-3 kernel: f883f12d
Oct 28 13:52:14 hora-3 kernel: *pde = 00000000
Oct 28 13:52:14 hora-3 kernel: Oops: 0000 [#1]
Oct 28 13:52:14 hora-3 kernel: SMP
Oct 28 13:52:14 hora-3 kernel: Modules linked in: ib_uverbs ib_umad ipv6
i2c_dev i2c_core sunrpc dm_mod video button battery ac uhci_hcd
hw_random ib_mthca ib_mad ib_core e1000 floppy
Oct 28 13:52:14 hora-3 kernel: CPU: 1
Oct 28 13:52:14 hora-3 kernel: EIP: 0060:[<f883f12d>] Not tainted VLI
Oct 28 13:52:14 hora-3 kernel: EFLAGS: 00010286 (2.6.13bohra)
Oct 28 13:52:14 hora-3 kernel: EIP is at ib_post_send_mad+0x1c/0x1b1
[ib_mad]
Oct 28 13:52:14 hora-3 kernel: eax: 09000000 ebx: c1a7d900 ecx:
c1a7d918 edx: 00000000
Oct 28 13:52:14 hora-3 kernel: esi: c1a7d918 edi: f6571f68 ebp:
f6571efc esp: f6571ed8
Oct 28 13:52:14 hora-3 kernel: ds: 007b es: 007b ss: 0068
Oct 28 13:52:14 hora-3 kernel: Process opensm (pid: 5224,
threadinfo=f6570000 task=f7dfb020)
Oct 28 13:52:14 hora-3 kernel: Stack: f883ef5a 00000000 c1a7d800
080bd018 f6571efc 00000000 f6a42900 a0f684f6
Oct 28 13:52:14 hora-3 kernel: f6571f68 f6571f74 f88f1728
00000000 00000018 000000e8 000000d0 f6a42948
Oct 28 13:52:14 hora-3 kernel: f68bda24 00000000 00000009
a0f684f6 00000009 c1a7d918 00000000 00000100
Oct 28 13:52:14 hora-3 kernel: Call Trace:
Oct 28 13:52:14 hora-3 kernel: [<c0104848>] show_stack+0x7c/0x92
Oct 28 13:52:14 hora-3 kernel: [<c01049c9>] show_registers+0x152/0x1ca
Oct 28 13:52:14 hora-3 kernel: [<c0104bcd>] die+0xf4/0x16f
Oct 28 13:52:14 hora-3 kernel: [<c011885c>] do_page_fault+0x463/0x649
Oct 28 13:52:14 hora-3 kernel: [<c01044bb>] error_code+0x4f/0x54
Oct 28 13:52:14 hora-3 kernel: [<f88f1728>] ib_umad_write+0x2d0/0x30e
[ib_umad]
Oct 28 13:52:14 hora-3 kernel: [<c015d69b>] vfs_write+0x155/0x15a
Oct 28 13:52:14 hora-3 kernel: [<c015d741>] sys_write+0x3d/0x64
Oct 28 13:52:14 hora-3 kernel: [<c01038d3>] sysenter_past_esp+0x54/0x75
Oct 28 13:52:14 hora-3 kernel: Code: e8 d8 63 af c7 89 d8 83 c4 0c 5b 5e
5f 5d c3 55 89 e5 57 56 89 c6 53 83 ec 18 85 f6 89 55 f0 0f 84 ff 00 00
00 8b 46 08 8d 5e e8 <8b> 50 10 8b 7b 14 85 d2 0f 84 7c 01 00 00 8b 4e
18 85 c9 74 0b
Thanks
Aniruddha
________________________________
From: [EMAIL PROTECTED] on behalf of Sean Hefty
Sent: Fri 10/28/2005 12:01 PM
To: Aniruddha Bohra
Cc: openib-general@openib.org
Subject: Re: [openib-general] OpenSM crash with today's trunk
Aniruddha Bohra wrote:
Oh well, I guess this is a different bug. Is there an oops or
anything in your kernel log, or is this just a userspace crash?
This is what I see :
Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0
Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use
Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM
Is this useful?
Is there any chance opensm is already running on the system? It sounds like
something has already registered to receive the same MADs that opensm wants to
receive.
- Sean
_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general