Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-26 Thread Roland Dreier
Hmm, I can't make it happen here unfortunately... I just get the
following on 2.6.12-rc5 with CONFIG_DEBUG_SPINLOCK:

[   26.001979] ib_mthca: Mellanox InfiniBand HCA driver v0.06-pre (November 8, 
2004)
[   26.026622] ib_mthca: Initializing Mellanox Technologies MT23108 InfiniHost 
(:04:00.0)
[   27.326318] ib_mthca :04:00.0: HCA FW version 3.0.1 is old (3.3.2 is 
current).
[   27.351207] ib_mthca :04:00.0: If you have problems, try updating your 
HCA FW.

but it actually seems to work (well enough that the port goes to active).

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-26 Thread Cong, Lenber
Dear Woody & Roland
I downloaded the newest firmware and updated the system with it.
Now it works. 
Kernel can be booted and module can be loaded successfully.
Thanks for your great help.
It is very useful to me.

Thanks - Lenber
-Original Message-
From: Roland Dreier [mailto:[EMAIL PROTECTED] 
Sent: 2005年5月26日 7:30
To: Woodruff, Robert J
Cc: Cong, Lenber; openib-general@openib.org
Subject: Re: [openib-general] [HELP] Encounter Kernel Panic when Add 
MellanoxHCA Supporting on 2.6.9 Kernel

Robert> Unfortunately not, I did not have CONFIG_DEBUG_SPINLOCK
Robert> set and I did not save the old firmware before loading in
Robert> the new firmware.

I'll try to build an old fw image and see if I can reproduce it here.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-25 Thread Roland Dreier
Robert> Unfortunately not, I did not have CONFIG_DEBUG_SPINLOCK
Robert> set and I did not save the old firmware before loading in
Robert> the new firmware.

I'll try to build an old fw image and see if I can reproduce it here.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-25 Thread Woodruff, Robert J
 
Bob> Ok, I was able to reproduce this error on an IA32 system,
Bob> running the redhat 2.6.9-5.EL (UP kernel) with the IB patches
Bob> applied. It turned out to be a problem with the HCA card that
Bob> had older firmware, 3.0.1.

Roland> Did you get any kind of stack dump or traceback?

>We really shouldn't panic on downrev FW, so I'd like to get to the
>bottom of this.

> - R.

Unfortunately not, I did not have CONFIG_DEBUG_SPINLOCK set and 
I did not save the old firmware before loading in the new firmware.

Perhaps Lenber can get the traceback info before he updates his card.
That would be helpful as I agree it is not desirable to panic
on cards with old firmware.

woody



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-25 Thread Roland Dreier
Bob> Ok, I was able to reproduce this error on an IA32 system,
Bob> running the redhat 2.6.9-5.EL (UP kernel) with the IB patches
Bob> applied. It turned out to be a problem with the HCA card that
Bob> had older firmware, 3.0.1.

Did you get any kind of stack dump or traceback?

We really shouldn't panic on downrev FW, so I'd like to get to the
bottom of this.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-25 Thread Bob Woodruff
 

Lenber> Can I assume it is the problem of HCA card? Or the issue
Lenber> is relative with the SMP platform? So strange..

Roland> It's possible it's the HCA but I'm not sure what could be wrong.
With
Roland> CONFIG_DEBUG_SPINLOCK can you get more of the traceback?  The BUG()
Roland> should be producing a full stack trace.

Ok, I was able to reproduce this error on an IA32 system, running the 
redhat 2.6.9-5.EL (UP kernel) with the IB patches applied. It turned out
to be a problem with the HCA card that had older firmware, 3.0.1.

I updated the firmware to 3.3.2 and the system booted OK and everything 
seems to work fine. 

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-25 Thread Roland Dreier
Lenber> Can I assume it is the problem of HCA card? Or the issue
Lenber> is relative with the SMP platform? So strange..

It's possible it's the HCA but I'm not sure what could be wrong.  With
CONFIG_DEBUG_SPINLOCK can you get more of the traceback?  The BUG()
should be producing a full stack trace.

Thanks,
  Roland
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-25 Thread Cong, Lenber
I tried the patches (2.6.12-to-2.6.9, not svn backport) on an EM64T desktop 
(without HCA card). The kernel can be installed successfully.

I still can't reboot the kernel on Xeon SMP server, even with the new patches 
(svn backport). The same error was encountered.

Then I disabled the option CONFIG_DEBUG_SPINLOCK.
The error message disappeared, but the kernel still can't be booted.

Can I assume it is the problem of HCA card? Or the issue is relative with the 
SMP platform? So strange.. 

Thanks - Lenber

-Original Message-
From: Woodruff, Robert J 
Sent: 2005年5月25日 6:34
To: Cong, Lenber; openib-general@openib.org
Cc: 'Roland Dreier'
Subject: RE: [openib-general] [HELP] Encounter Kernel Panic when Add 
MellanoxHCA Supporting on 2.6.9 Kernel

Roland wrote,  
>I just tried the latest svn on 2.6.11 with CONFIG_DEBUG_SPINLOCK
>turned on, and I didn't see any problems.  The message

>driver/infiniband/hw/mthca/mthca_allocator.c: 46: spin_is_locked on 
>uninitialized spinlock: f70f7dac

>is coming from CHECK_LOCK, which is turned on with
>CONFIG_DEBUG_SPINLOCK.  However there should be more traceback
>information printed to the console as well... did that get dumped as
>well?

Bob> Roland, has anything been fixed since the 2.6.12 drop in
Bob> mthca that could account for this panic ?

>Not that I know of...

> - R.

I just installed the 

infiniband-backport-2.6.12-to-2.6.9-kernel-fixups-01.diff   
infiniband-backport-2.6.12-to-2.6.9-openib-drivers-02.diff  
infiniband-backport-2.6.12-to-2.6.9-openib-fixups-03.diff  

backport patches on a couple of old 900Mhz IA32 Xeon boxes 
and was able to build the kernel, load IPoIB and ping another node.
I used the Redhat configuration file /boot/config-2.6.9-5.ELsmp,
did a make oldconfig and selected modules for all of the infiniband drivers.
Then I built and installed the kernel with no problems. 

Maybe it is the platform (I have seen problems in the past with
the BIOS on some platforms being able to map the Mellanox H/W correctly)
or could bad Mellanox H/W cause this ?

Do you have any other platforms that you could try it on ?

woody


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-24 Thread Bob Woodruff
Roland wrote,  
>I just tried the latest svn on 2.6.11 with CONFIG_DEBUG_SPINLOCK
>turned on, and I didn't see any problems.  The message

>driver/infiniband/hw/mthca/mthca_allocator.c: 46: spin_is_locked on 
>uninitialized spinlock: f70f7dac

>is coming from CHECK_LOCK, which is turned on with
>CONFIG_DEBUG_SPINLOCK.  However there should be more traceback
>information printed to the console as well... did that get dumped as
>well?

Bob> Roland, has anything been fixed since the 2.6.12 drop in
Bob> mthca that could account for this panic ?

>Not that I know of...

> - R.

I just installed the 

infiniband-backport-2.6.12-to-2.6.9-kernel-fixups-01.diff   
infiniband-backport-2.6.12-to-2.6.9-openib-drivers-02.diff  
infiniband-backport-2.6.12-to-2.6.9-openib-fixups-03.diff  

backport patches on a couple of old 900Mhz IA32 Xeon boxes 
and was able to build the kernel, load IPoIB and ping another node.
I used the Redhat configuration file /boot/config-2.6.9-5.ELsmp,
did a make oldconfig and selected modules for all of the infiniband drivers.
Then I built and installed the kernel with no problems. 

Maybe it is the platform (I have seen problems in the past with
the BIOS on some platforms being able to map the Mellanox H/W correctly)
or could bad Mellanox H/W cause this ?

Do you have any other platforms that you could try it on ?

woody


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-24 Thread Roland Dreier
I just tried the latest svn on 2.6.11 with CONFIG_DEBUG_SPINLOCK
turned on, and I didn't see any problems.  The message

driver/infiniband/hw/mthca/mthca_allocator.c: 46: spin_is_locked on 
uninitialized spinlock: f70f7dac

is coming from CHECK_LOCK, which is turned on with
CONFIG_DEBUG_SPINLOCK.  However there should be more traceback
information printed to the console as well... did that get dumped as
well?

Bob> Roland, has anything been fixed since the 2.6.12 drop in
Bob> mthca that could account for this panic ?

Not that I know of...

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [HELP] Encounter Kernel Panic when Add MellanoxHCA Supporting on 2.6.9 Kernel

2005-05-24 Thread Bob Woodruff
Lenber wrote,
> Here is the error message of rebooting: "

> Starting udev:

> Initializing hardware ... Storage, network, audio Kernel Panic - not  

> syncing   driver/infiniband/hw/mthca/mthca_allocator.c: 46: spin_is_locked
on 
> uninitialized spinlock: f70f7dac

These patches were based on the code that is in 2.6.12-rc. I have to admit
I did not try them on IA32, I only tested on Itanium and EM64T. 
I can also try to set up some IA32 machines today or I can send you
newer patches based on SVN2425 that I have tested on EM64T on Redhat EL4.0. 

Roland, has anything been fixed since the 2.6.12 drop in mthca that could
account for this panic ?

woody



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general