Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
Hmm, I can't make it happen here unfortunately... I just get the following on 2.6.12-rc5 with CONFIG_DEBUG_SPINLOCK: [ 26.001979] ib_mthca: Mellanox InfiniBand HCA driver v0.06-pre (November 8, 2004) [ 26.026622] ib_mthca: Initializing Mellanox Technologies MT23108 InfiniHost (:04:00.0) [ 27.326318] ib_mthca :04:00.0: HCA FW version 3.0.1 is old (3.3.2 is current). [ 27.351207] ib_mthca :04:00.0: If you have problems, try updating your HCA FW. but it actually seems to work (well enough that the port goes to active). - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
Dear Woody & Roland I downloaded the newest firmware and updated the system with it. Now it works. Kernel can be booted and module can be loaded successfully. Thanks for your great help. It is very useful to me. Thanks - Lenber -Original Message- From: Roland Dreier [mailto:[EMAIL PROTECTED] Sent: 2005年5月26日 7:30 To: Woodruff, Robert J Cc: Cong, Lenber; openib-general@openib.org Subject: Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel Robert> Unfortunately not, I did not have CONFIG_DEBUG_SPINLOCK Robert> set and I did not save the old firmware before loading in Robert> the new firmware. I'll try to build an old fw image and see if I can reproduce it here. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
Robert> Unfortunately not, I did not have CONFIG_DEBUG_SPINLOCK Robert> set and I did not save the old firmware before loading in Robert> the new firmware. I'll try to build an old fw image and see if I can reproduce it here. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
Bob> Ok, I was able to reproduce this error on an IA32 system, Bob> running the redhat 2.6.9-5.EL (UP kernel) with the IB patches Bob> applied. It turned out to be a problem with the HCA card that Bob> had older firmware, 3.0.1. Roland> Did you get any kind of stack dump or traceback? >We really shouldn't panic on downrev FW, so I'd like to get to the >bottom of this. > - R. Unfortunately not, I did not have CONFIG_DEBUG_SPINLOCK set and I did not save the old firmware before loading in the new firmware. Perhaps Lenber can get the traceback info before he updates his card. That would be helpful as I agree it is not desirable to panic on cards with old firmware. woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
Bob> Ok, I was able to reproduce this error on an IA32 system, Bob> running the redhat 2.6.9-5.EL (UP kernel) with the IB patches Bob> applied. It turned out to be a problem with the HCA card that Bob> had older firmware, 3.0.1. Did you get any kind of stack dump or traceback? We really shouldn't panic on downrev FW, so I'd like to get to the bottom of this. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
Lenber> Can I assume it is the problem of HCA card? Or the issue Lenber> is relative with the SMP platform? So strange.. Roland> It's possible it's the HCA but I'm not sure what could be wrong. With Roland> CONFIG_DEBUG_SPINLOCK can you get more of the traceback? The BUG() Roland> should be producing a full stack trace. Ok, I was able to reproduce this error on an IA32 system, running the redhat 2.6.9-5.EL (UP kernel) with the IB patches applied. It turned out to be a problem with the HCA card that had older firmware, 3.0.1. I updated the firmware to 3.3.2 and the system booted OK and everything seems to work fine. woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
Lenber> Can I assume it is the problem of HCA card? Or the issue Lenber> is relative with the SMP platform? So strange.. It's possible it's the HCA but I'm not sure what could be wrong. With CONFIG_DEBUG_SPINLOCK can you get more of the traceback? The BUG() should be producing a full stack trace. Thanks, Roland ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
I tried the patches (2.6.12-to-2.6.9, not svn backport) on an EM64T desktop (without HCA card). The kernel can be installed successfully. I still can't reboot the kernel on Xeon SMP server, even with the new patches (svn backport). The same error was encountered. Then I disabled the option CONFIG_DEBUG_SPINLOCK. The error message disappeared, but the kernel still can't be booted. Can I assume it is the problem of HCA card? Or the issue is relative with the SMP platform? So strange.. Thanks - Lenber -Original Message- From: Woodruff, Robert J Sent: 2005年5月25日 6:34 To: Cong, Lenber; openib-general@openib.org Cc: 'Roland Dreier' Subject: RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel Roland wrote, >I just tried the latest svn on 2.6.11 with CONFIG_DEBUG_SPINLOCK >turned on, and I didn't see any problems. The message >driver/infiniband/hw/mthca/mthca_allocator.c: 46: spin_is_locked on >uninitialized spinlock: f70f7dac >is coming from CHECK_LOCK, which is turned on with >CONFIG_DEBUG_SPINLOCK. However there should be more traceback >information printed to the console as well... did that get dumped as >well? Bob> Roland, has anything been fixed since the 2.6.12 drop in Bob> mthca that could account for this panic ? >Not that I know of... > - R. I just installed the infiniband-backport-2.6.12-to-2.6.9-kernel-fixups-01.diff infiniband-backport-2.6.12-to-2.6.9-openib-drivers-02.diff infiniband-backport-2.6.12-to-2.6.9-openib-fixups-03.diff backport patches on a couple of old 900Mhz IA32 Xeon boxes and was able to build the kernel, load IPoIB and ping another node. I used the Redhat configuration file /boot/config-2.6.9-5.ELsmp, did a make oldconfig and selected modules for all of the infiniband drivers. Then I built and installed the kernel with no problems. Maybe it is the platform (I have seen problems in the past with the BIOS on some platforms being able to map the Mellanox H/W correctly) or could bad Mellanox H/W cause this ? Do you have any other platforms that you could try it on ? woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
Roland wrote, >I just tried the latest svn on 2.6.11 with CONFIG_DEBUG_SPINLOCK >turned on, and I didn't see any problems. The message >driver/infiniband/hw/mthca/mthca_allocator.c: 46: spin_is_locked on >uninitialized spinlock: f70f7dac >is coming from CHECK_LOCK, which is turned on with >CONFIG_DEBUG_SPINLOCK. However there should be more traceback >information printed to the console as well... did that get dumped as >well? Bob> Roland, has anything been fixed since the 2.6.12 drop in Bob> mthca that could account for this panic ? >Not that I know of... > - R. I just installed the infiniband-backport-2.6.12-to-2.6.9-kernel-fixups-01.diff infiniband-backport-2.6.12-to-2.6.9-openib-drivers-02.diff infiniband-backport-2.6.12-to-2.6.9-openib-fixups-03.diff backport patches on a couple of old 900Mhz IA32 Xeon boxes and was able to build the kernel, load IPoIB and ping another node. I used the Redhat configuration file /boot/config-2.6.9-5.ELsmp, did a make oldconfig and selected modules for all of the infiniband drivers. Then I built and installed the kernel with no problems. Maybe it is the platform (I have seen problems in the past with the BIOS on some platforms being able to map the Mellanox H/W correctly) or could bad Mellanox H/W cause this ? Do you have any other platforms that you could try it on ? woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
I just tried the latest svn on 2.6.11 with CONFIG_DEBUG_SPINLOCK turned on, and I didn't see any problems. The message driver/infiniband/hw/mthca/mthca_allocator.c: 46: spin_is_locked on uninitialized spinlock: f70f7dac is coming from CHECK_LOCK, which is turned on with CONFIG_DEBUG_SPINLOCK. However there should be more traceback information printed to the console as well... did that get dumped as well? Bob> Roland, has anything been fixed since the 2.6.12 drop in Bob> mthca that could account for this panic ? Not that I know of... - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel
Lenber wrote, > Here is the error message of rebooting: " > Starting udev: > Initializing hardware ... Storage, network, audio Kernel Panic - not > syncing driver/infiniband/hw/mthca/mthca_allocator.c: 46: spin_is_locked on > uninitialized spinlock: f70f7dac These patches were based on the code that is in 2.6.12-rc. I have to admit I did not try them on IA32, I only tested on Itanium and EM64T. I can also try to set up some IA32 machines today or I can send you newer patches based on SVN2425 that I have tested on EM64T on Redhat EL4.0. Roland, has anything been fixed since the 2.6.12 drop in mthca that could account for this panic ? woody ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general