[openib-general] IB initialization

2006-04-14 Thread Chris Worley
I installed the SuSE 10 OpenIB RC2 RPMS.

The installation went well, but I'm stuck at the startup.

As an IBGD user, I'm used to an init file in /etc/init.d... but there was none.

From the wiki, I was able to glean:

Make the udev file:

# cat  /etc/udev/rules.d/40-infiniband.rules
KERNEL=umad*, NAME=infiniband/%k
KERNEL=issm*, NAME=infiniband/%k

 Install some modules:

modprobe ib_ucm
modprobe ib_cm
modprobe ib_uverbs
modprobe ib_umad

And make sure udev is running, and start the opensm.

I've done this on all nodes, and ibstat shows I have a link up and
running on every node.  Opensm doesn't show any scanning.  It's been
hung all night at:

# opensm --console
-
OpenSM Rev:openib-1.2.0
Based on OpenIB svn Exported revision
Command Line Arguments:
 Enabling OpenSM interactive console
 Log File: /var/log/osm.log
-
OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision

Using default guid 0x2c9020020c3ce

OpenSM Console

$ Entering MASTER state

SUBNET UP

IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does ibv_devinfo.

Is there a definitive guide on the initialization of the drivers and fabric?

Also, is there an MVAPICH2 for SuSE 10 RPM?

Thanks,

Chris
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Hal Rosenstock
Hi Chris,

On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
 I installed the SuSE 10 OpenIB RC2 RPMS.
 
 The installation went well, but I'm stuck at the startup.
 
 As an IBGD user, I'm used to an init file in /etc/init.d... but there was 
 none.
 
 From the wiki, I was able to glean:
 
 Make the udev file:
 
 # cat  /etc/udev/rules.d/40-infiniband.rules
 KERNEL=umad*, NAME=infiniband/%k
 KERNEL=issm*, NAME=infiniband/%k
 
  Install some modules:
 
 modprobe ib_ucm
 modprobe ib_cm
 modprobe ib_uverbs
 modprobe ib_umad
 
 And make sure udev is running, and start the opensm.
 
 I've done this on all nodes, and ibstat shows I have a link up and
 running on every node.  Opensm doesn't show any scanning.  It's been
 hung all night at:
 
 # opensm --console
 -
 OpenSM Rev:openib-1.2.0
 Based on OpenIB svn Exported revision
 Command Line Arguments:
  Enabling OpenSM interactive console
  Log File: /var/log/osm.log
 -
 OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
 
 Using default guid 0x2c9020020c3ce
 
 OpenSM Console
 
 $ Entering MASTER state
 
 SUBNET UP

Looks like everything is fine from the OpenSM standpoint.

I see no indication that OpenSM is hung. You are in the console.

Also, why do you say OpenSM isn't scanning ?

What is in /var/log/osm.log ? Any errors ?

If you want more verbose messages start OpenSM with -V.

-- Hal

 IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does ibv_devinfo.
 
 Is there a definitive guide on the initialization of the drivers and fabric?
 
 Also, is there an MVAPICH2 for SuSE 10 RPM?
 
 Thanks,
 
 Chris
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Chris Worley
Hal,

You're correct... the results of the scans are in /var/log/osm.log.  I
was expecting the -console mode to show more.

In looking at the /var/log/osm.log I'm seeing a lot of:

Reporting Generic Notice type:4 num:144

For different GUIDs.  Is there a place to look these up?

I still don't have IPoIB running, and ibv_devinfo says I'm not setup
right either (couldn't open a device).

Thanks,

Chris
On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
 Hi Chris,

 On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
  I installed the SuSE 10 OpenIB RC2 RPMS.
 
  The installation went well, but I'm stuck at the startup.
 
  As an IBGD user, I'm used to an init file in /etc/init.d... but there was 
  none.
 
  From the wiki, I was able to glean:
 
  Make the udev file:
 
  # cat  /etc/udev/rules.d/40-infiniband.rules
  KERNEL=umad*, NAME=infiniband/%k
  KERNEL=issm*, NAME=infiniband/%k
 
   Install some modules:
 
  modprobe ib_ucm
  modprobe ib_cm
  modprobe ib_uverbs
  modprobe ib_umad
 
  And make sure udev is running, and start the opensm.
 
  I've done this on all nodes, and ibstat shows I have a link up and
  running on every node.  Opensm doesn't show any scanning.  It's been
  hung all night at:
 
  # opensm --console
  -
  OpenSM Rev:openib-1.2.0
  Based on OpenIB svn Exported revision
  Command Line Arguments:
   Enabling OpenSM interactive console
   Log File: /var/log/osm.log
  -
  OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
 
  Using default guid 0x2c9020020c3ce
 
  OpenSM Console
 
  $ Entering MASTER state
 
  SUBNET UP

 Looks like everything is fine from the OpenSM standpoint.

 I see no indication that OpenSM is hung. You are in the console.

 Also, why do you say OpenSM isn't scanning ?

 What is in /var/log/osm.log ? Any errors ?

 If you want more verbose messages start OpenSM with -V.

 -- Hal

  IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does ibv_devinfo.
 
  Is there a definitive guide on the initialization of the drivers and fabric?
 
  Also, is there an MVAPICH2 for SuSE 10 RPM?
 
  Thanks,
 
  Chris
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c::__osm_mcmr_rcv_respond: Fix MTU, rate, and PLL selectors

2006-04-14 Thread Hal Rosenstock
OpenSM/osm_sa_mcmember_record.c::__osm_mcmr_rcv_respond: Fix MTU, rate,
and PLL selectors to be exactly

Signed-off-by: Hal Rosenstock [EMAIL PROTECTED]

---
Note this patch has been applied to both trunk and 1.0 branch.

Index: opensm/osm_sa_mcmember_record.c
===
--- opensm/osm_sa_mcmember_record.c (revision 6466)
+++ opensm/osm_sa_mcmember_record.c (working copy)
@@ -548,8 +548,11 @@ __osm_mcmr_rcv_respond(
   *p_resp_mcmember_rec = *p_mcmember_rec;
 
   /* Fill in the mtu, rate, and packet lifetime selectors */
+  p_resp_mcmember_rec-mtu = 0x3f;
   p_resp_mcmember_rec-mtu |= 26; /* exactly */
+  p_resp_mcmember_rec-rate = 0x3f;
   p_resp_mcmember_rec-rate |=  26; /* exactly */
+  p_resp_mcmember_rec-pkt_life = 0x3f;
   p_resp_mcmember_rec-pkt_life |= 26; /* exactly */
 
   status = osm_vendor_send(



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Hal Rosenstock
Hi again Chris,

On Fri, 2006-04-14 at 10:39, Chris Worley wrote:
 Hal,
 
 You're correct... the results of the scans are in /var/log/osm.log.  I
 was expecting the -console mode to show more.
 
 In looking at the /var/log/osm.log I'm seeing a lot of:

 Reporting Generic Notice type:4 num:144
 
 For different GUIDs.

What's a lot ? One for each GUID ? What's the capability mask indicated
?

   Is there a place to look these up?

Yes, the IBA spec (volume 1). Trap 144 indicates that the capability
mask at the indicated LID has changed.

 I still don't have IPoIB running, and ibv_devinfo says I'm not setup
 right either (couldn't open a device).

I'm not sure why not.

-- Hal

 Thanks,
 
 Chris
 On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
  Hi Chris,
 
  On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
   I installed the SuSE 10 OpenIB RC2 RPMS.
  
   The installation went well, but I'm stuck at the startup.
  
   As an IBGD user, I'm used to an init file in /etc/init.d... but there was 
   none.
  
   From the wiki, I was able to glean:
  
   Make the udev file:
  
   # cat  /etc/udev/rules.d/40-infiniband.rules
   KERNEL=umad*, NAME=infiniband/%k
   KERNEL=issm*, NAME=infiniband/%k
  
Install some modules:
  
   modprobe ib_ucm
   modprobe ib_cm
   modprobe ib_uverbs
   modprobe ib_umad
  
   And make sure udev is running, and start the opensm.
  
   I've done this on all nodes, and ibstat shows I have a link up and
   running on every node.  Opensm doesn't show any scanning.  It's been
   hung all night at:
  
   # opensm --console
   -
   OpenSM Rev:openib-1.2.0
   Based on OpenIB svn Exported revision
   Command Line Arguments:
Enabling OpenSM interactive console
Log File: /var/log/osm.log
   -
   OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
  
   Using default guid 0x2c9020020c3ce
  
   OpenSM Console
  
   $ Entering MASTER state
  
   SUBNET UP
 
  Looks like everything is fine from the OpenSM standpoint.
 
  I see no indication that OpenSM is hung. You are in the console.
 
  Also, why do you say OpenSM isn't scanning ?
 
  What is in /var/log/osm.log ? Any errors ?
 
  If you want more verbose messages start OpenSM with -V.
 
  -- Hal
 
   IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does ibv_devinfo.
  
   Is there a definitive guide on the initialization of the drivers and 
   fabric?
  
   Also, is there an MVAPICH2 for SuSE 10 RPM?
  
   Thanks,
  
   Chris
   ___
   openib-general mailing list
   openib-general@openib.org
   http://openib.org/mailman/listinfo/openib-general
  
   To unsubscribe, please visit 
   http://openib.org/mailman/listinfo/openib-general
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Chris Worley
Hal,

It looks like 1 per GUID.  I don't see a capability mask.  An example is:

Apr 14 07:28:18 879428 [40602960] - __osm_trap_rcv_process_request:
Received Generic Notice type:0x04 num:144 Producer:1 f
rom LID:0x0007 TID:0x0001
Apr 14 07:28:18 879513 [40602960] - osm_report_notice: Reporting
Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800
000,0x0002c9020020c3b6

Thanks,

Chris
On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
 Hi again Chris,

 On Fri, 2006-04-14 at 10:39, Chris Worley wrote:
  Hal,
 
  You're correct... the results of the scans are in /var/log/osm.log.  I
  was expecting the -console mode to show more.
 
  In looking at the /var/log/osm.log I'm seeing a lot of:
 
  Reporting Generic Notice type:4 num:144
 
  For different GUIDs.

 What's a lot ? One for each GUID ? What's the capability mask indicated
 ?

Is there a place to look these up?

 Yes, the IBA spec (volume 1). Trap 144 indicates that the capability
 mask at the indicated LID has changed.

  I still don't have IPoIB running, and ibv_devinfo says I'm not setup
  right either (couldn't open a device).

 I'm not sure why not.

 -- Hal

  Thanks,
 
  Chris
  On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
   Hi Chris,
  
   On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
I installed the SuSE 10 OpenIB RC2 RPMS.
   
The installation went well, but I'm stuck at the startup.
   
As an IBGD user, I'm used to an init file in /etc/init.d... but there 
was none.
   
From the wiki, I was able to glean:
   
Make the udev file:
   
# cat  /etc/udev/rules.d/40-infiniband.rules
KERNEL=umad*, NAME=infiniband/%k
KERNEL=issm*, NAME=infiniband/%k
   
 Install some modules:
   
modprobe ib_ucm
modprobe ib_cm
modprobe ib_uverbs
modprobe ib_umad
   
And make sure udev is running, and start the opensm.
   
I've done this on all nodes, and ibstat shows I have a link up and
running on every node.  Opensm doesn't show any scanning.  It's been
hung all night at:
   
# opensm --console
-
OpenSM Rev:openib-1.2.0
Based on OpenIB svn Exported revision
Command Line Arguments:
 Enabling OpenSM interactive console
 Log File: /var/log/osm.log
-
OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
   
Using default guid 0x2c9020020c3ce
   
OpenSM Console
   
$ Entering MASTER state
   
SUBNET UP
  
   Looks like everything is fine from the OpenSM standpoint.
  
   I see no indication that OpenSM is hung. You are in the console.
  
   Also, why do you say OpenSM isn't scanning ?
  
   What is in /var/log/osm.log ? Any errors ?
  
   If you want more verbose messages start OpenSM with -V.
  
   -- Hal
  
IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does 
ibv_devinfo.
   
Is there a definitive guide on the initialization of the drivers and 
fabric?
   
Also, is there an MVAPICH2 for SuSE 10 RPM?
   
Thanks,
   
Chris
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
   
To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
  
  
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general][patch review] srp: fmr implementation,

2006-04-14 Thread Vu Pham

Roland Dreier wrote:

Hmm, it's clearly a use-after-free bug.  Based on

ip is at srp_reconnect_target+0x2b1/0x5c0 [ib_srp]

can you guess where it is in the SRP driver or what it's accessing?

Also this is happening because the connection is being reconnected,
because SCSI commands are timing out.  Do you have any idea why this
is happening?  What does the target see when this happens?


It crashed in cleared request queue ie.

list_for_each_entry(req, target-req_queue, list) {
req-scmnd-result = DID_RESET  16;
req-scmnd-scsi_done(req-scmnd);
}

Probably scsi command already freed thru abort; however, 
it's still in request queue


Vu
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Hal Rosenstock
Hi again Chris,

On Fri, 2006-04-14 at 11:29, Chris Worley wrote:
 Hal,
 
 It looks like 1 per GUID.  I don't see a capability mask.  An example is:
 
 Apr 14 07:28:18 879428 [40602960] - __osm_trap_rcv_process_request:
 Received Generic Notice type:0x04 num:144 Producer:1 f
 rom LID:0x0007 TID:0x0001
 Apr 14 07:28:18 879513 [40602960] - osm_report_notice: Reporting
 Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800
 000,0x0002c9020020c3b6

Are you running with verbose (-V) ? You only see that extra info then.

Just out of curiousity, how big is your subnet and what is the topology
?

-- Hal

 Thanks,
 
 Chris
 On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
  Hi again Chris,
 
  On Fri, 2006-04-14 at 10:39, Chris Worley wrote:
   Hal,
  
   You're correct... the results of the scans are in /var/log/osm.log.  I
   was expecting the -console mode to show more.
  
   In looking at the /var/log/osm.log I'm seeing a lot of:
  
   Reporting Generic Notice type:4 num:144
  
   For different GUIDs.
 
  What's a lot ? One for each GUID ? What's the capability mask indicated
  ?
 
 Is there a place to look these up?
 
  Yes, the IBA spec (volume 1). Trap 144 indicates that the capability
  mask at the indicated LID has changed.
 
   I still don't have IPoIB running, and ibv_devinfo says I'm not setup
   right either (couldn't open a device).
 
  I'm not sure why not.
 
  -- Hal
 
   Thanks,
  
   Chris
   On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
Hi Chris,
   
On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
 I installed the SuSE 10 OpenIB RC2 RPMS.

 The installation went well, but I'm stuck at the startup.

 As an IBGD user, I'm used to an init file in /etc/init.d... but there 
 was none.

 From the wiki, I was able to glean:

 Make the udev file:

 # cat  /etc/udev/rules.d/40-infiniband.rules
 KERNEL=umad*, NAME=infiniband/%k
 KERNEL=issm*, NAME=infiniband/%k

  Install some modules:

 modprobe ib_ucm
 modprobe ib_cm
 modprobe ib_uverbs
 modprobe ib_umad

 And make sure udev is running, and start the opensm.

 I've done this on all nodes, and ibstat shows I have a link up and
 running on every node.  Opensm doesn't show any scanning.  It's been
 hung all night at:

 # opensm --console
 -
 OpenSM Rev:openib-1.2.0
 Based on OpenIB svn Exported revision
 Command Line Arguments:
  Enabling OpenSM interactive console
  Log File: /var/log/osm.log
 -
 OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision

 Using default guid 0x2c9020020c3ce

 OpenSM Console

 $ Entering MASTER state

 SUBNET UP
   
Looks like everything is fine from the OpenSM standpoint.
   
I see no indication that OpenSM is hung. You are in the console.
   
Also, why do you say OpenSM isn't scanning ?
   
What is in /var/log/osm.log ? Any errors ?
   
If you want more verbose messages start OpenSM with -V.
   
-- Hal
   
 IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does 
 ibv_devinfo.

 Is there a definitive guide on the initialization of the drivers and 
 fabric?

 Also, is there an MVAPICH2 for SuSE 10 RPM?

 Thanks,

 Chris
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
   
   
   ___
   openib-general mailing list
   openib-general@openib.org
   http://openib.org/mailman/listinfo/openib-general
  
   To unsubscribe, please visit 
   http://openib.org/mailman/listinfo/openib-general
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general][patch review] srp: fmr implementation,

2006-04-14 Thread Roland Dreier
Hmm, I don't understand what could be going on.  srp_send_tsk_mgmt()
currently has:

if (req-cmd_done) {
srp_remove_req(target, req, req_index);
scmnd-scsi_done(scmnd);
} else if (!req-tsk_status) {
srp_remove_req(target, req, req_index);
scmnd-result = DID_ABORT  16;
ret = SUCCESS;
}

and otherwise it returns FAILED.  So in both cases where it finishes
the command, it removes it from the list of pending requests.

Are you absolutely sure you saw the crash with a patched driver that
has that code in srp_send_tsk_mgmt()?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Chris Worley
Hal,

Note that I got an /etc/init.d/openibd script that's getting
everything running (I still don't have IPoIB or MVAPICH2... but I can
live without both).

Now, I'm running Opensm with -V, and it looks as I expected.

This cluster is simple: 9 nodes in one switch.

Thanks,

Chris
On 14 Apr 2006 11:38:21 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
 Hi again Chris,

 On Fri, 2006-04-14 at 11:29, Chris Worley wrote:
  Hal,
 
  It looks like 1 per GUID.  I don't see a capability mask.  An example is:
 
  Apr 14 07:28:18 879428 [40602960] - __osm_trap_rcv_process_request:
  Received Generic Notice type:0x04 num:144 Producer:1 f
  rom LID:0x0007 TID:0x0001
  Apr 14 07:28:18 879513 [40602960] - osm_report_notice: Reporting
  Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800
  000,0x0002c9020020c3b6

 Are you running with verbose (-V) ? You only see that extra info then.

 Just out of curiousity, how big is your subnet and what is the topology
 ?

 -- Hal

  Thanks,
 
  Chris
  On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
   Hi again Chris,
  
   On Fri, 2006-04-14 at 10:39, Chris Worley wrote:
Hal,
   
You're correct... the results of the scans are in /var/log/osm.log.  I
was expecting the -console mode to show more.
   
In looking at the /var/log/osm.log I'm seeing a lot of:
   
Reporting Generic Notice type:4 num:144
   
For different GUIDs.
  
   What's a lot ? One for each GUID ? What's the capability mask indicated
   ?
  
  Is there a place to look these up?
  
   Yes, the IBA spec (volume 1). Trap 144 indicates that the capability
   mask at the indicated LID has changed.
  
I still don't have IPoIB running, and ibv_devinfo says I'm not setup
right either (couldn't open a device).
  
   I'm not sure why not.
  
   -- Hal
  
Thanks,
   
Chris
On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock [EMAIL PROTECTED] wrote:
 Hi Chris,

 On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
  I installed the SuSE 10 OpenIB RC2 RPMS.
 
  The installation went well, but I'm stuck at the startup.
 
  As an IBGD user, I'm used to an init file in /etc/init.d... but 
  there was none.
 
  From the wiki, I was able to glean:
 
  Make the udev file:
 
  # cat  /etc/udev/rules.d/40-infiniband.rules
  KERNEL=umad*, NAME=infiniband/%k
  KERNEL=issm*, NAME=infiniband/%k
 
   Install some modules:
 
  modprobe ib_ucm
  modprobe ib_cm
  modprobe ib_uverbs
  modprobe ib_umad
 
  And make sure udev is running, and start the opensm.
 
  I've done this on all nodes, and ibstat shows I have a link up and
  running on every node.  Opensm doesn't show any scanning.  It's been
  hung all night at:
 
  # opensm --console
  -
  OpenSM Rev:openib-1.2.0
  Based on OpenIB svn Exported revision
  Command Line Arguments:
   Enabling OpenSM interactive console
   Log File: /var/log/osm.log
  -
  OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
 
  Using default guid 0x2c9020020c3ce
 
  OpenSM Console
 
  $ Entering MASTER state
 
  SUBNET UP

 Looks like everything is fine from the OpenSM standpoint.

 I see no indication that OpenSM is hung. You are in the console.

 Also, why do you say OpenSM isn't scanning ?

 What is in /var/log/osm.log ? Any errors ?

 If you want more verbose messages start OpenSM with -V.

 -- Hal

  IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does 
  ibv_devinfo.
 
  Is there a definitive guide on the initialization of the drivers 
  and fabric?
 
  Also, is there an MVAPICH2 for SuSE 10 RPM?
 
  Thanks,
 
  Chris
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
   
To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
  
  
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit 

Re: [openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-14 Thread Sean Hefty

Matt Leininger wrote:

  Ok.  So the current state is that the mainline devel branch will be
broken for a while?


The trunk is always suppose to work, let alone compile.  This needs to be fixed 
quickly, or the offending code moved to a branch.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-14 Thread Bryan O'Sullivan
On Fri, 2006-04-14 at 09:19 -0700, Sean Hefty wrote:
 Matt Leininger wrote:
Ok.  So the current state is that the mainline devel branch will be
  broken for a while?
 
 The trunk is always suppose to work, let alone compile.  This needs to be 
 fixed 
 quickly, or the offending code moved to a branch.

There is nothing that needs to be fixed.  Matt was just not using the
right combination of bits when we was trying to compile the world.

b

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general][patch review] srp: fmr implementation,

2006-04-14 Thread Vu Pham

Roland Dreier wrote:

Hmm, I don't understand what could be going on.  srp_send_tsk_mgmt()
currently has:

if (req-cmd_done) {
srp_remove_req(target, req, req_index);
scmnd-scsi_done(scmnd);
} else if (!req-tsk_status) {
srp_remove_req(target, req, req_index);
scmnd-result = DID_ABORT  16;
ret = SUCCESS;
}

and otherwise it returns FAILED.  So in both cases where it finishes
the command, it removes it from the list of pending requests.

Are you absolutely sure you saw the crash with a patched driver that
has that code in srp_send_tsk_mgmt()?


I'm sure that I patched srp driver revision 6036. It has the 
above code in srp_send_tsk_mgmt()


I don't have time to work on this today. I'll get back with 
more debug details on Monday


Thanks,
Vu
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-14 Thread Roger Heflin

Sayantan Sur wrote:

Hello Roger,

I'm just CC-ing this to openib-general for the community.

Thanks for giving us access. I have verified that the
`ibv_get_device_list' verb is indeed *missing* from the OpenIB install.
I'm afraid that given this Redhat rpm, it is difficult to get mvapich to
work (without patching it).

As Roland and others have indicated, perhaps the best way is for you to
upgrade to atleast the 1.0 branch. That should be the most stable OpenIB
release yet.

https://openib.org/svn/gen2/branches/1.0/src/userspace/

You should be able to keep the kernel stuff intact and just upgrade the
user level support (management, libibverbs, libmthca). You may skip
upgrading management, however it'll be best to upgrade it too, lest you
face any OpenSM issues.

Thanks,
Sayantan.



I now have the machines running RHEL4U3 + kernel.org 2.6.16.5 + the
Openib 1.0 userspace, given that the RPM spec files did work for the
openib tools that made things pretty simple, and have a resonable
set of rpms and tar files to execute the kernel+userspace update.

I have succeeded in getting OpenMPI to compile and execute HPL under
raw IB, and so far I am getting reasonable results and no corruption

Mvapich compiles but appears to not have made the mpirun version for
Infiniband, and yells about that when attempting to start HPL, I have
not yet looked at that in detail to see what the nature of the failure
is.

  Roger
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-14 Thread Sayantan Sur
Hi Roger,

 Mvapich compiles but appears to not have made the mpirun version for
 Infiniband, and yells about that when attempting to start HPL, I have
 not yet looked at that in detail to see what the nature of the failure
 is.

Thanks for reporting this. Infact, just today we have fixed this in the
MVAPICH trunk. This problem was reported by another user on
mvapich-discuss.

http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-April/98.html

If this was the error you got, we'll be glad if you could just `svn up'
your tree and give it a shot.

Please let us know if this worked for you.

Thanks,
Sayantan.

 
   Roger

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] IB initialization

2006-04-14 Thread Bob Woodruff
Chris wrote, 
As an IBGD user, I'm used to an init file in /etc/init.d... but there was
none.

From the wiki, I was able to glean:

Make the udev file:

# cat  /etc/udev/rules.d/40-infiniband.rules
KERNEL=umad*, NAME=infiniband/%k
KERNEL=issm*, NAME=infiniband/%k

 Install some modules:

modprobe ib_ucm
modprobe ib_cm
modprobe ib_uverbs
modprobe ib_umad


Is there a definitive guide on the initialization of the drivers and
fabric?

FYI to anyone else trying to get things loaded and running.
Here is an init.d startup script that I use to load and start the IB
drivers.
You can use it and or edit it to load the drivers that you want. 
My script makes the dev nodes manually, but if you have udev, you can use
that instead.

#!/bin/sh
#
# ib : A script to control openib.org kernel module start
#


# Set variables
module1=ib_mthca
module2=ib_mad
module3=ib_sa
module4=ib_ipoib
module5=ib_uverbs
module6=ib_umad
module7=ib_cm 
module8=ib_ucm 
module9=ib_sdp
module10=ib_srp
module11=rdma_cm
module12=rdma_ucm
# module13=kdapl   depreciated
module14=iscsi_tcp
module15=ib_iser
device=infiniband
mode=666


# Set default module parameters
det_max_pages_percent=0
det_retry_time=0
det_window_size=0



usage()
{
  echo Usage: $0 {start|stop|restart|reload} [module_parameters]
}



verify_root_privilege()
{
  if [ $UID != 0 ]; then
echo You must be root to modify $module state
exit 1
  fi
}



start()
{
  verify_root_privilege

  kernel_ver=$(uname -r)
 
module1_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/hw/mthca/$mo
dule1.ko
 
module2_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
2.ko
 
module3_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
3.ko
 
module4_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/ipoib/$m
odule4.ko
 
module5_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
5.ko
 
module6_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
6.ko
 
module7_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
7.ko
 
module8_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
8.ko
 
module9_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/sdp/$mod
ule9.ko
 
module10_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/srp/$mo
dule10.ko
 
module11_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$modul
e11.ko
 
module12_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$modul
e12.ko
#
module13_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/kdapl/$
module13.ko
  module14_path=/lib/modules/$kernel_ver/kernel/drivers/scsi/$module14.ko
 
module15_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/iser/$m
odule15.ko

  if test -e $module1_path; then
echo Loading $module1
sudo /sbin/modprobe $module1 $@
  else
echo Module $module not found ($module1_path does not exist)!
  fi
  if test -e $module2_path; then
echo Loading $module2
sudo /sbin/modprobe $module2  $@
  else
echo Module $module not found ($module2_path does not exist)!
  fi

  if test -e $module3_path; then
echo Loading $module3
sudo /sbin/modprobe $module3  $@
  else
echo Module $module not found ($module3_path does not exist)!
  fi

  if test -e $module4_path; then
echo Loading $module4
sudo /sbin/modprobe $module4  $@
  else
echo Module $module not found ($module4_path does not exist)!
  fi

  if test -e $module5_path; then
echo Loading $module5
sudo /sbin/modprobe $module5  $@
  else
echo Module $module not found ($module5_path does not exist)!
  fi

  if test -e $module6_path; then
echo Loading $module6
sudo /sbin/modprobe $module6  $@
  else
echo Module $module not found ($module6_path does not exist)!
  fi

  if test -e $module7_path; then
echo Loading $module7
sudo /sbin/modprobe $module7  $@
  else
echo Module $module not found ($module7_path does not exist)!
  fi

  if test -e $module8_path; then
echo Loading $module8
sudo /sbin/modprobe $module8  $@
  else
echo Module $module not found ($module8_path does not exist)!
  fi

  if test -e $module9_path; then
echo Loading $module9
sudo /sbin/modprobe $module9  $@
  else
echo Module $module not found ($module9_path does not exist)!
  fi
  if test -e $module10_path; then
echo Loading $module10
sudo /sbin/modprobe $module10  $@
  else
echo Module $module not found ($module10_path does not exist)!
  fi
  if test -e $module11_path; then
echo Loading $module11
sudo /sbin/modprobe $module11  $@
  else
echo Module $module not found ($module11_path does not exist)!
  fi
  if test -e $module12_path; then
echo Loading $module12
sudo /sbin/modprobe $module12  $@
  else
echo Module $module not found ($module12_path does not exist)!
  fi
#  if test -e $module13_path; then
#echo Loading $module13
#sudo /sbin/modprobe $module13  $@
#  else
#

Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-14 Thread Roger Heflin

Sayantan Sur wrote:

Hi Roger,


Mvapich compiles but appears to not have made the mpirun version for
Infiniband, and yells about that when attempting to start HPL, I have
not yet looked at that in detail to see what the nature of the failure
is.


Thanks for reporting this. Infact, just today we have fixed this in the
MVAPICH trunk. This problem was reported by another user on
mvapich-discuss.

http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-April/98.html

If this was the error you got, we'll be glad if you could just `svn up'
your tree and give it a shot.

Please let us know if this worked for you.

Thanks,
Sayantan.


Yeap, that is what I saw.

I will try the newer version Monday.

   Roger
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general