Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-14 Thread Roger Heflin

Sayantan Sur wrote:

Hi Roger,


Mvapich compiles but appears to not have made the mpirun version for
Infiniband, and yells about that when attempting to start HPL, I have
not yet looked at that in detail to see what the nature of the failure
is.


Thanks for reporting this. Infact, just today we have fixed this in the
MVAPICH trunk. This problem was reported by another user on
mvapich-discuss.

http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-April/98.html

If this was the error you got, we'll be glad if you could just `svn up'
your tree and give it a shot.

Please let us know if this worked for you.

Thanks,
Sayantan.


Yeap, that is what I saw.

I will try the newer version Monday.

   Roger
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] IB initialization

2006-04-14 Thread Bob Woodruff
Chris wrote, 
>As an IBGD user, I'm used to an init file in /etc/init.d... but there was
none.

>From the wiki, I was able to glean:

Make the udev file:

># cat > /etc/udev/rules.d/40-infiniband.rules
>KERNEL="umad*", NAME="infiniband/%k"
>KERNEL="issm*", NAME="infiniband/%k"

 Install some modules:

>modprobe ib_ucm
>modprobe ib_cm
>modprobe ib_uverbs
>modprobe ib_umad


>Is there a definitive guide on the initialization of the drivers and
fabric?

FYI to anyone else trying to get things loaded and running.
Here is an init.d startup script that I use to load and start the IB
drivers.
You can use it and or edit it to load the drivers that you want. 
My script makes the dev nodes manually, but if you have udev, you can use
that instead.

#!/bin/sh
#
# ib : A script to control openib.org kernel module start
#


# Set variables
module1=ib_mthca
module2=ib_mad
module3=ib_sa
module4=ib_ipoib
module5=ib_uverbs
module6=ib_umad
module7=ib_cm 
module8=ib_ucm 
module9=ib_sdp
module10=ib_srp
module11=rdma_cm
module12=rdma_ucm
# module13=kdapl   depreciated
module14=iscsi_tcp
module15=ib_iser
device=infiniband
mode=666


# Set default module parameters
det_max_pages_percent=0
det_retry_time=0
det_window_size=0



usage()
{
  echo "Usage: $0 {start|stop|restart|reload} [module_parameters]"
}



verify_root_privilege()
{
  if [ $UID != 0 ]; then
echo "You must be root to modify $module state"
exit 1
  fi
}



start()
{
  verify_root_privilege

  kernel_ver=$(uname -r)
 
module1_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/hw/mthca/$mo
dule1.ko
 
module2_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
2.ko
 
module3_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
3.ko
 
module4_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/ipoib/$m
odule4.ko
 
module5_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
5.ko
 
module6_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
6.ko
 
module7_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
7.ko
 
module8_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module
8.ko
 
module9_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/sdp/$mod
ule9.ko
 
module10_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/srp/$mo
dule10.ko
 
module11_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$modul
e11.ko
 
module12_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$modul
e12.ko
#
module13_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/kdapl/$
module13.ko
  module14_path=/lib/modules/$kernel_ver/kernel/drivers/scsi/$module14.ko
 
module15_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/iser/$m
odule15.ko

  if test -e $module1_path; then
echo "Loading $module1"
sudo /sbin/modprobe $module1 $@
  else
echo "Module $module not found ($module1_path does not exist)!"
  fi
  if test -e $module2_path; then
echo "Loading $module2"
sudo /sbin/modprobe $module2  $@
  else
echo "Module $module not found ($module2_path does not exist)!"
  fi

  if test -e $module3_path; then
echo "Loading $module3"
sudo /sbin/modprobe $module3  $@
  else
echo "Module $module not found ($module3_path does not exist)!"
  fi

  if test -e $module4_path; then
echo "Loading $module4"
sudo /sbin/modprobe $module4  $@
  else
echo "Module $module not found ($module4_path does not exist)!"
  fi

  if test -e $module5_path; then
echo "Loading $module5"
sudo /sbin/modprobe $module5  $@
  else
echo "Module $module not found ($module5_path does not exist)!"
  fi

  if test -e $module6_path; then
echo "Loading $module6"
sudo /sbin/modprobe $module6  $@
  else
echo "Module $module not found ($module6_path does not exist)!"
  fi

  if test -e $module7_path; then
echo "Loading $module7"
sudo /sbin/modprobe $module7  $@
  else
echo "Module $module not found ($module7_path does not exist)!"
  fi

  if test -e $module8_path; then
echo "Loading $module8"
sudo /sbin/modprobe $module8  $@
  else
echo "Module $module not found ($module8_path does not exist)!"
  fi

  if test -e $module9_path; then
echo "Loading $module9"
sudo /sbin/modprobe $module9  $@
  else
echo "Module $module not found ($module9_path does not exist)!"
  fi
  if test -e $module10_path; then
echo "Loading $module10"
sudo /sbin/modprobe $module10  $@
  else
echo "Module $module not found ($module10_path does not exist)!"
  fi
  if test -e $module11_path; then
echo "Loading $module11"
sudo /sbin/modprobe $module11  $@
  else
echo "Module $module not found ($module11_path does not exist)!"
  fi
  if test -e $module12_path; then
echo "Loading $module12"
sudo /sbin/modprobe $module12  $@
  else
echo "Module $module not found ($module12_path does not exist)!"
  fi
#  if test -e $module13_path; then
#echo

Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-14 Thread Sayantan Sur
Hi Roger,

> Mvapich compiles but appears to not have made the mpirun version for
> Infiniband, and yells about that when attempting to start HPL, I have
> not yet looked at that in detail to see what the nature of the failure
> is.

Thanks for reporting this. Infact, just today we have fixed this in the
MVAPICH trunk. This problem was reported by another user on
mvapich-discuss.

http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-April/98.html

If this was the error you got, we'll be glad if you could just `svn up'
your tree and give it a shot.

Please let us know if this worked for you.

Thanks,
Sayantan.

> 
>   Roger

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.

2006-04-14 Thread Roger Heflin

Sayantan Sur wrote:

Hello Roger,

I'm just CC-ing this to openib-general for the community.

Thanks for giving us access. I have verified that the
`ibv_get_device_list' verb is indeed *missing* from the OpenIB install.
I'm afraid that given this Redhat rpm, it is difficult to get mvapich to
work (without patching it).

As Roland and others have indicated, perhaps the best way is for you to
upgrade to atleast the 1.0 branch. That should be the most stable OpenIB
release yet.

https://openib.org/svn/gen2/branches/1.0/src/userspace/

You should be able to keep the kernel stuff intact and just upgrade the
user level support (management, libibverbs, libmthca). You may skip
upgrading management, however it'll be best to upgrade it too, lest you
face any OpenSM issues.

Thanks,
Sayantan.



I now have the machines running RHEL4U3 + kernel.org 2.6.16.5 + the
Openib 1.0 userspace, given that the RPM spec files did work for the
openib tools that made things pretty simple, and have a resonable
set of rpms and tar files to execute the kernel+userspace update.

I have succeeded in getting OpenMPI to compile and execute HPL under
raw IB, and so far I am getting reasonable results and no corruption

Mvapich compiles but appears to not have made the mpirun version for
Infiniband, and yells about that when attempting to start HPL, I have
not yet looked at that in detail to see what the nature of the failure
is.

  Roger
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general][patch review] srp: fmr implementation,

2006-04-14 Thread Vu Pham

Roland Dreier wrote:

Hmm, I don't understand what could be going on.  srp_send_tsk_mgmt()
currently has:

if (req->cmd_done) {
srp_remove_req(target, req, req_index);
scmnd->scsi_done(scmnd);
} else if (!req->tsk_status) {
srp_remove_req(target, req, req_index);
scmnd->result = DID_ABORT << 16;
ret = SUCCESS;
}

and otherwise it returns FAILED.  So in both cases where it finishes
the command, it removes it from the list of pending requests.

Are you absolutely sure you saw the crash with a patched driver that
has that code in srp_send_tsk_mgmt()?


I'm sure that I patched srp driver revision 6036. It has the 
above code in srp_send_tsk_mgmt()


I don't have time to work on this today. I'll get back with 
more debug details on Monday


Thanks,
Vu
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Hal Rosenstock
Chris,

On Fri, 2006-04-14 at 12:05, Chris Worley wrote:
> Hal,
> 
> Note that I got an /etc/init.d/openibd script that's getting
> everything running (I still don't have IPoIB or MVAPICH2... but I can
> live without both).
> 
> Now, I'm running Opensm with -V, and it looks as I expected.

So what's the cap mask change being indicated ?

Are you sure there's no embedded SM running on the switch ?

-- Hal

> 
> This cluster is simple: 9 nodes in one switch.
> 
> Thanks,
> 
> Chris
> On 14 Apr 2006 11:38:21 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > Hi again Chris,
> >
> > On Fri, 2006-04-14 at 11:29, Chris Worley wrote:
> > > Hal,
> > >
> > > It looks like 1 per GUID.  I don't see a capability mask.  An example is:
> > >
> > > Apr 14 07:28:18 879428 [40602960] -> __osm_trap_rcv_process_request:
> > > Received Generic Notice type:0x04 num:144 Producer:1 f
> > > rom LID:0x0007 TID:0x0001
> > > Apr 14 07:28:18 879513 [40602960] -> osm_report_notice: Reporting
> > > Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800
> > > 000,0x0002c9020020c3b6
> >
> > Are you running with verbose (-V) ? You only see that extra info then.
> >
> > Just out of curiousity, how big is your subnet and what is the topology
> > ?
> >
> > -- Hal
> >
> > > Thanks,
> > >
> > > Chris
> > > On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > > Hi again Chris,
> > > >
> > > > On Fri, 2006-04-14 at 10:39, Chris Worley wrote:
> > > > > Hal,
> > > > >
> > > > > You're correct... the results of the scans are in /var/log/osm.log.  I
> > > > > was expecting the "-console" mode to show more.
> > > > >
> > > > > In looking at the /var/log/osm.log I'm seeing a lot of:
> > > > >
> > > > > Reporting Generic Notice type:4 num:144
> > > > >
> > > > > For different GUIDs.
> > > >
> > > > What's a lot ? One for each GUID ? What's the capability mask indicated
> > > > ?
> > > >
> > > > >   Is there a place to look these up?
> > > >
> > > > Yes, the IBA spec (volume 1). Trap 144 indicates that the capability
> > > > mask at the indicated LID has changed.
> > > >
> > > > > I still don't have IPoIB running, and ibv_devinfo says I'm not setup
> > > > > right either (couldn't open a device).
> > > >
> > > > I'm not sure why not.
> > > >
> > > > -- Hal
> > > >
> > > > > Thanks,
> > > > >
> > > > > Chris
> > > > > On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> 
> > > > > wrote:
> > > > > > Hi Chris,
> > > > > >
> > > > > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
> > > > > > > I installed the SuSE 10 OpenIB RC2 RPMS.
> > > > > > >
> > > > > > > The installation went well, but I'm stuck at the startup.
> > > > > > >
> > > > > > > As an IBGD user, I'm used to an init file in /etc/init.d... but 
> > > > > > > there was none.
> > > > > > >
> > > > > > > >From the wiki, I was able to glean:
> > > > > > >
> > > > > > > Make the udev file:
> > > > > > >
> > > > > > > # cat > /etc/udev/rules.d/40-infiniband.rules
> > > > > > > KERNEL="umad*", NAME="infiniband/%k"
> > > > > > > KERNEL="issm*", NAME="infiniband/%k"
> > > > > > >
> > > > > > >  Install some modules:
> > > > > > >
> > > > > > > modprobe ib_ucm
> > > > > > > modprobe ib_cm
> > > > > > > modprobe ib_uverbs
> > > > > > > modprobe ib_umad
> > > > > > >
> > > > > > > And make sure udev is running, and start the opensm.
> > > > > > >
> > > > > > > I've done this on all nodes, and ibstat shows I have a link up and
> > > > > > > running on every node.  Opensm doesn't show any scanning.  It's 
> > > > > > > been
> > > > > > > hung all night at:
> > > > > > >
> > > > > > > # opensm --console
> > > > > > > -
> > > > > > > OpenSM Rev:openib-1.2.0
> > > > > > > Based on OpenIB svn Exported revision
> > > > > > > Command Line Arguments:
> > > > > > >  Enabling OpenSM interactive console
> > > > > > >  Log File: /var/log/osm.log
> > > > > > > -
> > > > > > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
> > > > > > >
> > > > > > > Using default guid 0x2c9020020c3ce
> > > > > > >
> > > > > > > OpenSM Console
> > > > > > >
> > > > > > > $ Entering MASTER state
> > > > > > >
> > > > > > > SUBNET UP
> > > > > >
> > > > > > Looks like everything is fine from the OpenSM standpoint.
> > > > > >
> > > > > > I see no indication that OpenSM is hung. You are in the console.
> > > > > >
> > > > > > Also, why do you say OpenSM isn't "scanning" ?
> > > > > >
> > > > > > What is in /var/log/osm.log ? Any errors ?
> > > > > >
> > > > > > If you want more verbose messages start OpenSM with -V.
> > > > > >
> > > > > > -- Hal
> > > > > >
> > > > > > > IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does 
> > > > > > > ibv_devinfo.
> > > > > > >
> > > > > > > Is there a definitive guide on the initialization of the drivers 
> > > > > > > and fabric?
> > > > > > >
> > > > > > > Also, is there an MVAPICH2 for SuSE 10 RPM?

Re: [openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-14 Thread Bryan O'Sullivan
On Fri, 2006-04-14 at 09:19 -0700, Sean Hefty wrote:
> Matt Leininger wrote:
> >   Ok.  So the current state is that the mainline devel branch will be
> > broken for a while?
> 
> The trunk is always suppose to work, let alone compile.  This needs to be 
> fixed 
> quickly, or the offending code moved to a branch.

There is nothing that needs to be fixed.  Matt was just not using the
right combination of bits when we was trying to compile the world.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1

2006-04-14 Thread Sean Hefty

Matt Leininger wrote:

  Ok.  So the current state is that the mainline devel branch will be
broken for a while?


The trunk is always suppose to work, let alone compile.  This needs to be fixed 
quickly, or the offending code moved to a branch.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Chris Worley
Hal,

Note that I got an /etc/init.d/openibd script that's getting
everything running (I still don't have IPoIB or MVAPICH2... but I can
live without both).

Now, I'm running Opensm with -V, and it looks as I expected.

This cluster is simple: 9 nodes in one switch.

Thanks,

Chris
On 14 Apr 2006 11:38:21 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> Hi again Chris,
>
> On Fri, 2006-04-14 at 11:29, Chris Worley wrote:
> > Hal,
> >
> > It looks like 1 per GUID.  I don't see a capability mask.  An example is:
> >
> > Apr 14 07:28:18 879428 [40602960] -> __osm_trap_rcv_process_request:
> > Received Generic Notice type:0x04 num:144 Producer:1 f
> > rom LID:0x0007 TID:0x0001
> > Apr 14 07:28:18 879513 [40602960] -> osm_report_notice: Reporting
> > Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800
> > 000,0x0002c9020020c3b6
>
> Are you running with verbose (-V) ? You only see that extra info then.
>
> Just out of curiousity, how big is your subnet and what is the topology
> ?
>
> -- Hal
>
> > Thanks,
> >
> > Chris
> > On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > Hi again Chris,
> > >
> > > On Fri, 2006-04-14 at 10:39, Chris Worley wrote:
> > > > Hal,
> > > >
> > > > You're correct... the results of the scans are in /var/log/osm.log.  I
> > > > was expecting the "-console" mode to show more.
> > > >
> > > > In looking at the /var/log/osm.log I'm seeing a lot of:
> > > >
> > > > Reporting Generic Notice type:4 num:144
> > > >
> > > > For different GUIDs.
> > >
> > > What's a lot ? One for each GUID ? What's the capability mask indicated
> > > ?
> > >
> > > >   Is there a place to look these up?
> > >
> > > Yes, the IBA spec (volume 1). Trap 144 indicates that the capability
> > > mask at the indicated LID has changed.
> > >
> > > > I still don't have IPoIB running, and ibv_devinfo says I'm not setup
> > > > right either (couldn't open a device).
> > >
> > > I'm not sure why not.
> > >
> > > -- Hal
> > >
> > > > Thanks,
> > > >
> > > > Chris
> > > > On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > > > Hi Chris,
> > > > >
> > > > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
> > > > > > I installed the SuSE 10 OpenIB RC2 RPMS.
> > > > > >
> > > > > > The installation went well, but I'm stuck at the startup.
> > > > > >
> > > > > > As an IBGD user, I'm used to an init file in /etc/init.d... but 
> > > > > > there was none.
> > > > > >
> > > > > > >From the wiki, I was able to glean:
> > > > > >
> > > > > > Make the udev file:
> > > > > >
> > > > > > # cat > /etc/udev/rules.d/40-infiniband.rules
> > > > > > KERNEL="umad*", NAME="infiniband/%k"
> > > > > > KERNEL="issm*", NAME="infiniband/%k"
> > > > > >
> > > > > >  Install some modules:
> > > > > >
> > > > > > modprobe ib_ucm
> > > > > > modprobe ib_cm
> > > > > > modprobe ib_uverbs
> > > > > > modprobe ib_umad
> > > > > >
> > > > > > And make sure udev is running, and start the opensm.
> > > > > >
> > > > > > I've done this on all nodes, and ibstat shows I have a link up and
> > > > > > running on every node.  Opensm doesn't show any scanning.  It's been
> > > > > > hung all night at:
> > > > > >
> > > > > > # opensm --console
> > > > > > -
> > > > > > OpenSM Rev:openib-1.2.0
> > > > > > Based on OpenIB svn Exported revision
> > > > > > Command Line Arguments:
> > > > > >  Enabling OpenSM interactive console
> > > > > >  Log File: /var/log/osm.log
> > > > > > -
> > > > > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
> > > > > >
> > > > > > Using default guid 0x2c9020020c3ce
> > > > > >
> > > > > > OpenSM Console
> > > > > >
> > > > > > $ Entering MASTER state
> > > > > >
> > > > > > SUBNET UP
> > > > >
> > > > > Looks like everything is fine from the OpenSM standpoint.
> > > > >
> > > > > I see no indication that OpenSM is hung. You are in the console.
> > > > >
> > > > > Also, why do you say OpenSM isn't "scanning" ?
> > > > >
> > > > > What is in /var/log/osm.log ? Any errors ?
> > > > >
> > > > > If you want more verbose messages start OpenSM with -V.
> > > > >
> > > > > -- Hal
> > > > >
> > > > > > IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does 
> > > > > > ibv_devinfo.
> > > > > >
> > > > > > Is there a definitive guide on the initialization of the drivers 
> > > > > > and fabric?
> > > > > >
> > > > > > Also, is there an MVAPICH2 for SuSE 10 RPM?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Chris
> > > > > > ___
> > > > > > openib-general mailing list
> > > > > > openib-general@openib.org
> > > > > > http://openib.org/mailman/listinfo/openib-general
> > > > > >
> > > > > > To unsubscribe, please visit 
> > > > > > http://openib.org/mailman/listinfo/openib-general
> > > > >
> > > > >
> > > > ___
> > > > openib-general maili

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-14 Thread Roland Dreier
Hmm, I don't understand what could be going on.  srp_send_tsk_mgmt()
currently has:

if (req->cmd_done) {
srp_remove_req(target, req, req_index);
scmnd->scsi_done(scmnd);
} else if (!req->tsk_status) {
srp_remove_req(target, req, req_index);
scmnd->result = DID_ABORT << 16;
ret = SUCCESS;
}

and otherwise it returns FAILED.  So in both cases where it finishes
the command, it removes it from the list of pending requests.

Are you absolutely sure you saw the crash with a patched driver that
has that code in srp_send_tsk_mgmt()?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Hal Rosenstock
Hi again Chris,

On Fri, 2006-04-14 at 11:29, Chris Worley wrote:
> Hal,
> 
> It looks like 1 per GUID.  I don't see a capability mask.  An example is:
> 
> Apr 14 07:28:18 879428 [40602960] -> __osm_trap_rcv_process_request:
> Received Generic Notice type:0x04 num:144 Producer:1 f
> rom LID:0x0007 TID:0x0001
> Apr 14 07:28:18 879513 [40602960] -> osm_report_notice: Reporting
> Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800
> 000,0x0002c9020020c3b6

Are you running with verbose (-V) ? You only see that extra info then.

Just out of curiousity, how big is your subnet and what is the topology
?

-- Hal

> Thanks,
> 
> Chris
> On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > Hi again Chris,
> >
> > On Fri, 2006-04-14 at 10:39, Chris Worley wrote:
> > > Hal,
> > >
> > > You're correct... the results of the scans are in /var/log/osm.log.  I
> > > was expecting the "-console" mode to show more.
> > >
> > > In looking at the /var/log/osm.log I'm seeing a lot of:
> > >
> > > Reporting Generic Notice type:4 num:144
> > >
> > > For different GUIDs.
> >
> > What's a lot ? One for each GUID ? What's the capability mask indicated
> > ?
> >
> > >   Is there a place to look these up?
> >
> > Yes, the IBA spec (volume 1). Trap 144 indicates that the capability
> > mask at the indicated LID has changed.
> >
> > > I still don't have IPoIB running, and ibv_devinfo says I'm not setup
> > > right either (couldn't open a device).
> >
> > I'm not sure why not.
> >
> > -- Hal
> >
> > > Thanks,
> > >
> > > Chris
> > > On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > > Hi Chris,
> > > >
> > > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
> > > > > I installed the SuSE 10 OpenIB RC2 RPMS.
> > > > >
> > > > > The installation went well, but I'm stuck at the startup.
> > > > >
> > > > > As an IBGD user, I'm used to an init file in /etc/init.d... but there 
> > > > > was none.
> > > > >
> > > > > >From the wiki, I was able to glean:
> > > > >
> > > > > Make the udev file:
> > > > >
> > > > > # cat > /etc/udev/rules.d/40-infiniband.rules
> > > > > KERNEL="umad*", NAME="infiniband/%k"
> > > > > KERNEL="issm*", NAME="infiniband/%k"
> > > > >
> > > > >  Install some modules:
> > > > >
> > > > > modprobe ib_ucm
> > > > > modprobe ib_cm
> > > > > modprobe ib_uverbs
> > > > > modprobe ib_umad
> > > > >
> > > > > And make sure udev is running, and start the opensm.
> > > > >
> > > > > I've done this on all nodes, and ibstat shows I have a link up and
> > > > > running on every node.  Opensm doesn't show any scanning.  It's been
> > > > > hung all night at:
> > > > >
> > > > > # opensm --console
> > > > > -
> > > > > OpenSM Rev:openib-1.2.0
> > > > > Based on OpenIB svn Exported revision
> > > > > Command Line Arguments:
> > > > >  Enabling OpenSM interactive console
> > > > >  Log File: /var/log/osm.log
> > > > > -
> > > > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
> > > > >
> > > > > Using default guid 0x2c9020020c3ce
> > > > >
> > > > > OpenSM Console
> > > > >
> > > > > $ Entering MASTER state
> > > > >
> > > > > SUBNET UP
> > > >
> > > > Looks like everything is fine from the OpenSM standpoint.
> > > >
> > > > I see no indication that OpenSM is hung. You are in the console.
> > > >
> > > > Also, why do you say OpenSM isn't "scanning" ?
> > > >
> > > > What is in /var/log/osm.log ? Any errors ?
> > > >
> > > > If you want more verbose messages start OpenSM with -V.
> > > >
> > > > -- Hal
> > > >
> > > > > IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does 
> > > > > ibv_devinfo.
> > > > >
> > > > > Is there a definitive guide on the initialization of the drivers and 
> > > > > fabric?
> > > > >
> > > > > Also, is there an MVAPICH2 for SuSE 10 RPM?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Chris
> > > > > ___
> > > > > openib-general mailing list
> > > > > openib-general@openib.org
> > > > > http://openib.org/mailman/listinfo/openib-general
> > > > >
> > > > > To unsubscribe, please visit 
> > > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > >
> > > ___
> > > openib-general mailing list
> > > openib-general@openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit 
> > > http://openib.org/mailman/listinfo/openib-general
> >
> >
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit h

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-14 Thread Vu Pham

Roland Dreier wrote:

Hmm, it's clearly a use-after-free bug.  Based on

ip is at srp_reconnect_target+0x2b1/0x5c0 [ib_srp]

can you guess where it is in the SRP driver or what it's accessing?

Also this is happening because the connection is being reconnected,
because SCSI commands are timing out.  Do you have any idea why this
is happening?  What does the target see when this happens?


It crashed in "cleared request queue" ie.

list_for_each_entry(req, &target->req_queue, list) {
req->scmnd->result = DID_RESET << 16;
req->scmnd->scsi_done(req->scmnd);
}

Probably scsi command already freed thru abort; however, 
it's still in request queue


Vu
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Chris Worley
Hal,

It looks like 1 per GUID.  I don't see a capability mask.  An example is:

Apr 14 07:28:18 879428 [40602960] -> __osm_trap_rcv_process_request:
Received Generic Notice type:0x04 num:144 Producer:1 f
rom LID:0x0007 TID:0x0001
Apr 14 07:28:18 879513 [40602960] -> osm_report_notice: Reporting
Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800
000,0x0002c9020020c3b6

Thanks,

Chris
On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> Hi again Chris,
>
> On Fri, 2006-04-14 at 10:39, Chris Worley wrote:
> > Hal,
> >
> > You're correct... the results of the scans are in /var/log/osm.log.  I
> > was expecting the "-console" mode to show more.
> >
> > In looking at the /var/log/osm.log I'm seeing a lot of:
> >
> > Reporting Generic Notice type:4 num:144
> >
> > For different GUIDs.
>
> What's a lot ? One for each GUID ? What's the capability mask indicated
> ?
>
> >   Is there a place to look these up?
>
> Yes, the IBA spec (volume 1). Trap 144 indicates that the capability
> mask at the indicated LID has changed.
>
> > I still don't have IPoIB running, and ibv_devinfo says I'm not setup
> > right either (couldn't open a device).
>
> I'm not sure why not.
>
> -- Hal
>
> > Thanks,
> >
> > Chris
> > On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > Hi Chris,
> > >
> > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
> > > > I installed the SuSE 10 OpenIB RC2 RPMS.
> > > >
> > > > The installation went well, but I'm stuck at the startup.
> > > >
> > > > As an IBGD user, I'm used to an init file in /etc/init.d... but there 
> > > > was none.
> > > >
> > > > >From the wiki, I was able to glean:
> > > >
> > > > Make the udev file:
> > > >
> > > > # cat > /etc/udev/rules.d/40-infiniband.rules
> > > > KERNEL="umad*", NAME="infiniband/%k"
> > > > KERNEL="issm*", NAME="infiniband/%k"
> > > >
> > > >  Install some modules:
> > > >
> > > > modprobe ib_ucm
> > > > modprobe ib_cm
> > > > modprobe ib_uverbs
> > > > modprobe ib_umad
> > > >
> > > > And make sure udev is running, and start the opensm.
> > > >
> > > > I've done this on all nodes, and ibstat shows I have a link up and
> > > > running on every node.  Opensm doesn't show any scanning.  It's been
> > > > hung all night at:
> > > >
> > > > # opensm --console
> > > > -
> > > > OpenSM Rev:openib-1.2.0
> > > > Based on OpenIB svn Exported revision
> > > > Command Line Arguments:
> > > >  Enabling OpenSM interactive console
> > > >  Log File: /var/log/osm.log
> > > > -
> > > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
> > > >
> > > > Using default guid 0x2c9020020c3ce
> > > >
> > > > OpenSM Console
> > > >
> > > > $ Entering MASTER state
> > > >
> > > > SUBNET UP
> > >
> > > Looks like everything is fine from the OpenSM standpoint.
> > >
> > > I see no indication that OpenSM is hung. You are in the console.
> > >
> > > Also, why do you say OpenSM isn't "scanning" ?
> > >
> > > What is in /var/log/osm.log ? Any errors ?
> > >
> > > If you want more verbose messages start OpenSM with -V.
> > >
> > > -- Hal
> > >
> > > > IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does 
> > > > ibv_devinfo.
> > > >
> > > > Is there a definitive guide on the initialization of the drivers and 
> > > > fabric?
> > > >
> > > > Also, is there an MVAPICH2 for SuSE 10 RPM?
> > > >
> > > > Thanks,
> > > >
> > > > Chris
> > > > ___
> > > > openib-general mailing list
> > > > openib-general@openib.org
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > > To unsubscribe, please visit 
> > > > http://openib.org/mailman/listinfo/openib-general
> > >
> > >
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
>
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Hal Rosenstock
Hi again Chris,

On Fri, 2006-04-14 at 10:39, Chris Worley wrote:
> Hal,
> 
> You're correct... the results of the scans are in /var/log/osm.log.  I
> was expecting the "-console" mode to show more.
> 
> In looking at the /var/log/osm.log I'm seeing a lot of:
>
> Reporting Generic Notice type:4 num:144
> 
> For different GUIDs.

What's a lot ? One for each GUID ? What's the capability mask indicated
?

>   Is there a place to look these up?

Yes, the IBA spec (volume 1). Trap 144 indicates that the capability
mask at the indicated LID has changed.

> I still don't have IPoIB running, and ibv_devinfo says I'm not setup
> right either (couldn't open a device).

I'm not sure why not.

-- Hal

> Thanks,
> 
> Chris
> On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > Hi Chris,
> >
> > On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
> > > I installed the SuSE 10 OpenIB RC2 RPMS.
> > >
> > > The installation went well, but I'm stuck at the startup.
> > >
> > > As an IBGD user, I'm used to an init file in /etc/init.d... but there was 
> > > none.
> > >
> > > >From the wiki, I was able to glean:
> > >
> > > Make the udev file:
> > >
> > > # cat > /etc/udev/rules.d/40-infiniband.rules
> > > KERNEL="umad*", NAME="infiniband/%k"
> > > KERNEL="issm*", NAME="infiniband/%k"
> > >
> > >  Install some modules:
> > >
> > > modprobe ib_ucm
> > > modprobe ib_cm
> > > modprobe ib_uverbs
> > > modprobe ib_umad
> > >
> > > And make sure udev is running, and start the opensm.
> > >
> > > I've done this on all nodes, and ibstat shows I have a link up and
> > > running on every node.  Opensm doesn't show any scanning.  It's been
> > > hung all night at:
> > >
> > > # opensm --console
> > > -
> > > OpenSM Rev:openib-1.2.0
> > > Based on OpenIB svn Exported revision
> > > Command Line Arguments:
> > >  Enabling OpenSM interactive console
> > >  Log File: /var/log/osm.log
> > > -
> > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
> > >
> > > Using default guid 0x2c9020020c3ce
> > >
> > > OpenSM Console
> > >
> > > $ Entering MASTER state
> > >
> > > SUBNET UP
> >
> > Looks like everything is fine from the OpenSM standpoint.
> >
> > I see no indication that OpenSM is hung. You are in the console.
> >
> > Also, why do you say OpenSM isn't "scanning" ?
> >
> > What is in /var/log/osm.log ? Any errors ?
> >
> > If you want more verbose messages start OpenSM with -V.
> >
> > -- Hal
> >
> > > IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does ibv_devinfo.
> > >
> > > Is there a definitive guide on the initialization of the drivers and 
> > > fabric?
> > >
> > > Also, is there an MVAPICH2 for SuSE 10 RPM?
> > >
> > > Thanks,
> > >
> > > Chris
> > > ___
> > > openib-general mailing list
> > > openib-general@openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit 
> > > http://openib.org/mailman/listinfo/openib-general
> >
> >
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c::__osm_mcmr_rcv_respond: Fix MTU, rate, and PLL selectors

2006-04-14 Thread Hal Rosenstock
OpenSM/osm_sa_mcmember_record.c::__osm_mcmr_rcv_respond: Fix MTU, rate,
and PLL selectors to be exactly

Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

---
Note this patch has been applied to both trunk and 1.0 branch.

Index: opensm/osm_sa_mcmember_record.c
===
--- opensm/osm_sa_mcmember_record.c (revision 6466)
+++ opensm/osm_sa_mcmember_record.c (working copy)
@@ -548,8 +548,11 @@ __osm_mcmr_rcv_respond(
   *p_resp_mcmember_rec = *p_mcmember_rec;
 
   /* Fill in the mtu, rate, and packet lifetime selectors */
+  p_resp_mcmember_rec->mtu &= 0x3f;
   p_resp_mcmember_rec->mtu |= 2<<6; /* exactly */
+  p_resp_mcmember_rec->rate &= 0x3f;
   p_resp_mcmember_rec->rate |=  2<<6; /* exactly */
+  p_resp_mcmember_rec->pkt_life &= 0x3f;
   p_resp_mcmember_rec->pkt_life |= 2<<6; /* exactly */
 
   status = osm_vendor_send(



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Chris Worley
Hal,

You're correct... the results of the scans are in /var/log/osm.log.  I
was expecting the "-console" mode to show more.

In looking at the /var/log/osm.log I'm seeing a lot of:

Reporting Generic Notice type:4 num:144

For different GUIDs.  Is there a place to look these up?

I still don't have IPoIB running, and ibv_devinfo says I'm not setup
right either (couldn't open a device).

Thanks,

Chris
On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> Hi Chris,
>
> On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
> > I installed the SuSE 10 OpenIB RC2 RPMS.
> >
> > The installation went well, but I'm stuck at the startup.
> >
> > As an IBGD user, I'm used to an init file in /etc/init.d... but there was 
> > none.
> >
> > >From the wiki, I was able to glean:
> >
> > Make the udev file:
> >
> > # cat > /etc/udev/rules.d/40-infiniband.rules
> > KERNEL="umad*", NAME="infiniband/%k"
> > KERNEL="issm*", NAME="infiniband/%k"
> >
> >  Install some modules:
> >
> > modprobe ib_ucm
> > modprobe ib_cm
> > modprobe ib_uverbs
> > modprobe ib_umad
> >
> > And make sure udev is running, and start the opensm.
> >
> > I've done this on all nodes, and ibstat shows I have a link up and
> > running on every node.  Opensm doesn't show any scanning.  It's been
> > hung all night at:
> >
> > # opensm --console
> > -
> > OpenSM Rev:openib-1.2.0
> > Based on OpenIB svn Exported revision
> > Command Line Arguments:
> >  Enabling OpenSM interactive console
> >  Log File: /var/log/osm.log
> > -
> > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
> >
> > Using default guid 0x2c9020020c3ce
> >
> > OpenSM Console
> >
> > $ Entering MASTER state
> >
> > SUBNET UP
>
> Looks like everything is fine from the OpenSM standpoint.
>
> I see no indication that OpenSM is hung. You are in the console.
>
> Also, why do you say OpenSM isn't "scanning" ?
>
> What is in /var/log/osm.log ? Any errors ?
>
> If you want more verbose messages start OpenSM with -V.
>
> -- Hal
>
> > IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does ibv_devinfo.
> >
> > Is there a definitive guide on the initialization of the drivers and fabric?
> >
> > Also, is there an MVAPICH2 for SuSE 10 RPM?
> >
> > Thanks,
> >
> > Chris
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
>
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] IB initialization

2006-04-14 Thread Hal Rosenstock
Hi Chris,

On Fri, 2006-04-14 at 10:19, Chris Worley wrote:
> I installed the SuSE 10 OpenIB RC2 RPMS.
> 
> The installation went well, but I'm stuck at the startup.
> 
> As an IBGD user, I'm used to an init file in /etc/init.d... but there was 
> none.
> 
> >From the wiki, I was able to glean:
> 
> Make the udev file:
> 
> # cat > /etc/udev/rules.d/40-infiniband.rules
> KERNEL="umad*", NAME="infiniband/%k"
> KERNEL="issm*", NAME="infiniband/%k"
> 
>  Install some modules:
> 
> modprobe ib_ucm
> modprobe ib_cm
> modprobe ib_uverbs
> modprobe ib_umad
> 
> And make sure udev is running, and start the opensm.
> 
> I've done this on all nodes, and ibstat shows I have a link up and
> running on every node.  Opensm doesn't show any scanning.  It's been
> hung all night at:
> 
> # opensm --console
> -
> OpenSM Rev:openib-1.2.0
> Based on OpenIB svn Exported revision
> Command Line Arguments:
>  Enabling OpenSM interactive console
>  Log File: /var/log/osm.log
> -
> OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision
> 
> Using default guid 0x2c9020020c3ce
> 
> OpenSM Console
> 
> $ Entering MASTER state
> 
> SUBNET UP

Looks like everything is fine from the OpenSM standpoint.

I see no indication that OpenSM is hung. You are in the console.

Also, why do you say OpenSM isn't "scanning" ?

What is in /var/log/osm.log ? Any errors ?

If you want more verbose messages start OpenSM with -V.

-- Hal

> IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does ibv_devinfo.
> 
> Is there a definitive guide on the initialization of the drivers and fabric?
> 
> Also, is there an MVAPICH2 for SuSE 10 RPM?
> 
> Thanks,
> 
> Chris
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] IB initialization

2006-04-14 Thread Chris Worley
I installed the SuSE 10 OpenIB RC2 RPMS.

The installation went well, but I'm stuck at the startup.

As an IBGD user, I'm used to an init file in /etc/init.d... but there was none.

>From the wiki, I was able to glean:

Make the udev file:

# cat > /etc/udev/rules.d/40-infiniband.rules
KERNEL="umad*", NAME="infiniband/%k"
KERNEL="issm*", NAME="infiniband/%k"

 Install some modules:

modprobe ib_ucm
modprobe ib_cm
modprobe ib_uverbs
modprobe ib_umad

And make sure udev is running, and start the opensm.

I've done this on all nodes, and ibstat shows I have a link up and
running on every node.  Opensm doesn't show any scanning.  It's been
hung all night at:

# opensm --console
-
OpenSM Rev:openib-1.2.0
Based on OpenIB svn Exported revision
Command Line Arguments:
 Enabling OpenSM interactive console
 Log File: /var/log/osm.log
-
OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision

Using default guid 0x2c9020020c3ce

OpenSM Console

$ Entering MASTER state

SUBNET UP

IPoIB isn't up.  ibv_rc_pingpong doesn't work.  Neither does ibv_devinfo.

Is there a definitive guide on the initialization of the drivers and fabric?

Also, is there an MVAPICH2 for SuSE 10 RPM?

Thanks,

Chris
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general