Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.
Sayantan Sur wrote: Hi Roger, Mvapich compiles but appears to not have made the mpirun version for Infiniband, and yells about that when attempting to start HPL, I have not yet looked at that in detail to see what the nature of the failure is. Thanks for reporting this. Infact, just today we have fixed this in the MVAPICH trunk. This problem was reported by another user on mvapich-discuss. http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-April/98.html If this was the error you got, we'll be glad if you could just `svn up' your tree and give it a shot. Please let us know if this worked for you. Thanks, Sayantan. Yeap, that is what I saw. I will try the newer version Monday. Roger ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB initialization
Chris wrote, >As an IBGD user, I'm used to an init file in /etc/init.d... but there was none. >From the wiki, I was able to glean: Make the udev file: ># cat > /etc/udev/rules.d/40-infiniband.rules >KERNEL="umad*", NAME="infiniband/%k" >KERNEL="issm*", NAME="infiniband/%k" Install some modules: >modprobe ib_ucm >modprobe ib_cm >modprobe ib_uverbs >modprobe ib_umad >Is there a definitive guide on the initialization of the drivers and fabric? FYI to anyone else trying to get things loaded and running. Here is an init.d startup script that I use to load and start the IB drivers. You can use it and or edit it to load the drivers that you want. My script makes the dev nodes manually, but if you have udev, you can use that instead. #!/bin/sh # # ib : A script to control openib.org kernel module start # # Set variables module1=ib_mthca module2=ib_mad module3=ib_sa module4=ib_ipoib module5=ib_uverbs module6=ib_umad module7=ib_cm module8=ib_ucm module9=ib_sdp module10=ib_srp module11=rdma_cm module12=rdma_ucm # module13=kdapl depreciated module14=iscsi_tcp module15=ib_iser device=infiniband mode=666 # Set default module parameters det_max_pages_percent=0 det_retry_time=0 det_window_size=0 usage() { echo "Usage: $0 {start|stop|restart|reload} [module_parameters]" } verify_root_privilege() { if [ $UID != 0 ]; then echo "You must be root to modify $module state" exit 1 fi } start() { verify_root_privilege kernel_ver=$(uname -r) module1_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/hw/mthca/$mo dule1.ko module2_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module 2.ko module3_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module 3.ko module4_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/ipoib/$m odule4.ko module5_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module 5.ko module6_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module 6.ko module7_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module 7.ko module8_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$module 8.ko module9_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/sdp/$mod ule9.ko module10_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/srp/$mo dule10.ko module11_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$modul e11.ko module12_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/core/$modul e12.ko # module13_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/kdapl/$ module13.ko module14_path=/lib/modules/$kernel_ver/kernel/drivers/scsi/$module14.ko module15_path=/lib/modules/$kernel_ver/kernel/drivers/infiniband/ulp/iser/$m odule15.ko if test -e $module1_path; then echo "Loading $module1" sudo /sbin/modprobe $module1 $@ else echo "Module $module not found ($module1_path does not exist)!" fi if test -e $module2_path; then echo "Loading $module2" sudo /sbin/modprobe $module2 $@ else echo "Module $module not found ($module2_path does not exist)!" fi if test -e $module3_path; then echo "Loading $module3" sudo /sbin/modprobe $module3 $@ else echo "Module $module not found ($module3_path does not exist)!" fi if test -e $module4_path; then echo "Loading $module4" sudo /sbin/modprobe $module4 $@ else echo "Module $module not found ($module4_path does not exist)!" fi if test -e $module5_path; then echo "Loading $module5" sudo /sbin/modprobe $module5 $@ else echo "Module $module not found ($module5_path does not exist)!" fi if test -e $module6_path; then echo "Loading $module6" sudo /sbin/modprobe $module6 $@ else echo "Module $module not found ($module6_path does not exist)!" fi if test -e $module7_path; then echo "Loading $module7" sudo /sbin/modprobe $module7 $@ else echo "Module $module not found ($module7_path does not exist)!" fi if test -e $module8_path; then echo "Loading $module8" sudo /sbin/modprobe $module8 $@ else echo "Module $module not found ($module8_path does not exist)!" fi if test -e $module9_path; then echo "Loading $module9" sudo /sbin/modprobe $module9 $@ else echo "Module $module not found ($module9_path does not exist)!" fi if test -e $module10_path; then echo "Loading $module10" sudo /sbin/modprobe $module10 $@ else echo "Module $module not found ($module10_path does not exist)!" fi if test -e $module11_path; then echo "Loading $module11" sudo /sbin/modprobe $module11 $@ else echo "Module $module not found ($module11_path does not exist)!" fi if test -e $module12_path; then echo "Loading $module12" sudo /sbin/modprobe $module12 $@ else echo "Module $module not found ($module12_path does not exist)!" fi # if test -e $module13_path; then #echo
Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.
Hi Roger, > Mvapich compiles but appears to not have made the mpirun version for > Infiniband, and yells about that when attempting to start HPL, I have > not yet looked at that in detail to see what the nature of the failure > is. Thanks for reporting this. Infact, just today we have fixed this in the MVAPICH trunk. This problem was reported by another user on mvapich-discuss. http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-April/98.html If this was the error you got, we'll be glad if you could just `svn up' your tree and give it a shot. Please let us know if this worked for you. Thanks, Sayantan. > > Roger -- http://www.cse.ohio-state.edu/~surs ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Trying to compile mvapich RHEL4U3 for ib.
Sayantan Sur wrote: Hello Roger, I'm just CC-ing this to openib-general for the community. Thanks for giving us access. I have verified that the `ibv_get_device_list' verb is indeed *missing* from the OpenIB install. I'm afraid that given this Redhat rpm, it is difficult to get mvapich to work (without patching it). As Roland and others have indicated, perhaps the best way is for you to upgrade to atleast the 1.0 branch. That should be the most stable OpenIB release yet. https://openib.org/svn/gen2/branches/1.0/src/userspace/ You should be able to keep the kernel stuff intact and just upgrade the user level support (management, libibverbs, libmthca). You may skip upgrading management, however it'll be best to upgrade it too, lest you face any OpenSM issues. Thanks, Sayantan. I now have the machines running RHEL4U3 + kernel.org 2.6.16.5 + the Openib 1.0 userspace, given that the RPM spec files did work for the openib tools that made things pretty simple, and have a resonable set of rpms and tar files to execute the kernel+userspace update. I have succeeded in getting OpenMPI to compile and execute HPL under raw IB, and so far I am getting reasonable results and no corruption Mvapich compiles but appears to not have made the mpirun version for Infiniband, and yells about that when attempting to start HPL, I have not yet looked at that in detail to see what the nature of the failure is. Roger ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general][patch review] srp: fmr implementation,
Roland Dreier wrote: Hmm, I don't understand what could be going on. srp_send_tsk_mgmt() currently has: if (req->cmd_done) { srp_remove_req(target, req, req_index); scmnd->scsi_done(scmnd); } else if (!req->tsk_status) { srp_remove_req(target, req, req_index); scmnd->result = DID_ABORT << 16; ret = SUCCESS; } and otherwise it returns FAILED. So in both cases where it finishes the command, it removes it from the list of pending requests. Are you absolutely sure you saw the crash with a patched driver that has that code in srp_send_tsk_mgmt()? I'm sure that I patched srp driver revision 6036. It has the above code in srp_send_tsk_mgmt() I don't have time to work on this today. I'll get back with more debug details on Monday Thanks, Vu ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IB initialization
Chris, On Fri, 2006-04-14 at 12:05, Chris Worley wrote: > Hal, > > Note that I got an /etc/init.d/openibd script that's getting > everything running (I still don't have IPoIB or MVAPICH2... but I can > live without both). > > Now, I'm running Opensm with -V, and it looks as I expected. So what's the cap mask change being indicated ? Are you sure there's no embedded SM running on the switch ? -- Hal > > This cluster is simple: 9 nodes in one switch. > > Thanks, > > Chris > On 14 Apr 2006 11:38:21 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > Hi again Chris, > > > > On Fri, 2006-04-14 at 11:29, Chris Worley wrote: > > > Hal, > > > > > > It looks like 1 per GUID. I don't see a capability mask. An example is: > > > > > > Apr 14 07:28:18 879428 [40602960] -> __osm_trap_rcv_process_request: > > > Received Generic Notice type:0x04 num:144 Producer:1 f > > > rom LID:0x0007 TID:0x0001 > > > Apr 14 07:28:18 879513 [40602960] -> osm_report_notice: Reporting > > > Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800 > > > 000,0x0002c9020020c3b6 > > > > Are you running with verbose (-V) ? You only see that extra info then. > > > > Just out of curiousity, how big is your subnet and what is the topology > > ? > > > > -- Hal > > > > > Thanks, > > > > > > Chris > > > On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > > Hi again Chris, > > > > > > > > On Fri, 2006-04-14 at 10:39, Chris Worley wrote: > > > > > Hal, > > > > > > > > > > You're correct... the results of the scans are in /var/log/osm.log. I > > > > > was expecting the "-console" mode to show more. > > > > > > > > > > In looking at the /var/log/osm.log I'm seeing a lot of: > > > > > > > > > > Reporting Generic Notice type:4 num:144 > > > > > > > > > > For different GUIDs. > > > > > > > > What's a lot ? One for each GUID ? What's the capability mask indicated > > > > ? > > > > > > > > > Is there a place to look these up? > > > > > > > > Yes, the IBA spec (volume 1). Trap 144 indicates that the capability > > > > mask at the indicated LID has changed. > > > > > > > > > I still don't have IPoIB running, and ibv_devinfo says I'm not setup > > > > > right either (couldn't open a device). > > > > > > > > I'm not sure why not. > > > > > > > > -- Hal > > > > > > > > > Thanks, > > > > > > > > > > Chris > > > > > On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> > > > > > wrote: > > > > > > Hi Chris, > > > > > > > > > > > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote: > > > > > > > I installed the SuSE 10 OpenIB RC2 RPMS. > > > > > > > > > > > > > > The installation went well, but I'm stuck at the startup. > > > > > > > > > > > > > > As an IBGD user, I'm used to an init file in /etc/init.d... but > > > > > > > there was none. > > > > > > > > > > > > > > >From the wiki, I was able to glean: > > > > > > > > > > > > > > Make the udev file: > > > > > > > > > > > > > > # cat > /etc/udev/rules.d/40-infiniband.rules > > > > > > > KERNEL="umad*", NAME="infiniband/%k" > > > > > > > KERNEL="issm*", NAME="infiniband/%k" > > > > > > > > > > > > > > Install some modules: > > > > > > > > > > > > > > modprobe ib_ucm > > > > > > > modprobe ib_cm > > > > > > > modprobe ib_uverbs > > > > > > > modprobe ib_umad > > > > > > > > > > > > > > And make sure udev is running, and start the opensm. > > > > > > > > > > > > > > I've done this on all nodes, and ibstat shows I have a link up and > > > > > > > running on every node. Opensm doesn't show any scanning. It's > > > > > > > been > > > > > > > hung all night at: > > > > > > > > > > > > > > # opensm --console > > > > > > > - > > > > > > > OpenSM Rev:openib-1.2.0 > > > > > > > Based on OpenIB svn Exported revision > > > > > > > Command Line Arguments: > > > > > > > Enabling OpenSM interactive console > > > > > > > Log File: /var/log/osm.log > > > > > > > - > > > > > > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > > > > > > > > > > > > > Using default guid 0x2c9020020c3ce > > > > > > > > > > > > > > OpenSM Console > > > > > > > > > > > > > > $ Entering MASTER state > > > > > > > > > > > > > > SUBNET UP > > > > > > > > > > > > Looks like everything is fine from the OpenSM standpoint. > > > > > > > > > > > > I see no indication that OpenSM is hung. You are in the console. > > > > > > > > > > > > Also, why do you say OpenSM isn't "scanning" ? > > > > > > > > > > > > What is in /var/log/osm.log ? Any errors ? > > > > > > > > > > > > If you want more verbose messages start OpenSM with -V. > > > > > > > > > > > > -- Hal > > > > > > > > > > > > > IPoIB isn't up. ibv_rc_pingpong doesn't work. Neither does > > > > > > > ibv_devinfo. > > > > > > > > > > > > > > Is there a definitive guide on the initialization of the drivers > > > > > > > and fabric? > > > > > > > > > > > > > > Also, is there an MVAPICH2 for SuSE 10 RPM?
Re: [openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1
On Fri, 2006-04-14 at 09:19 -0700, Sean Hefty wrote: > Matt Leininger wrote: > > Ok. So the current state is that the mainline devel branch will be > > broken for a while? > > The trunk is always suppose to work, let alone compile. This needs to be > fixed > quickly, or the offending code moved to a branch. There is nothing that needs to be fixed. Matt was just not using the right combination of bits when we was trying to compile the world. http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: Compile problems with core code and pathscale for svn6462 and linux-2.6.17-rc1
Matt Leininger wrote: Ok. So the current state is that the mainline devel branch will be broken for a while? The trunk is always suppose to work, let alone compile. This needs to be fixed quickly, or the offending code moved to a branch. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IB initialization
Hal, Note that I got an /etc/init.d/openibd script that's getting everything running (I still don't have IPoIB or MVAPICH2... but I can live without both). Now, I'm running Opensm with -V, and it looks as I expected. This cluster is simple: 9 nodes in one switch. Thanks, Chris On 14 Apr 2006 11:38:21 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > Hi again Chris, > > On Fri, 2006-04-14 at 11:29, Chris Worley wrote: > > Hal, > > > > It looks like 1 per GUID. I don't see a capability mask. An example is: > > > > Apr 14 07:28:18 879428 [40602960] -> __osm_trap_rcv_process_request: > > Received Generic Notice type:0x04 num:144 Producer:1 f > > rom LID:0x0007 TID:0x0001 > > Apr 14 07:28:18 879513 [40602960] -> osm_report_notice: Reporting > > Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800 > > 000,0x0002c9020020c3b6 > > Are you running with verbose (-V) ? You only see that extra info then. > > Just out of curiousity, how big is your subnet and what is the topology > ? > > -- Hal > > > Thanks, > > > > Chris > > On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > Hi again Chris, > > > > > > On Fri, 2006-04-14 at 10:39, Chris Worley wrote: > > > > Hal, > > > > > > > > You're correct... the results of the scans are in /var/log/osm.log. I > > > > was expecting the "-console" mode to show more. > > > > > > > > In looking at the /var/log/osm.log I'm seeing a lot of: > > > > > > > > Reporting Generic Notice type:4 num:144 > > > > > > > > For different GUIDs. > > > > > > What's a lot ? One for each GUID ? What's the capability mask indicated > > > ? > > > > > > > Is there a place to look these up? > > > > > > Yes, the IBA spec (volume 1). Trap 144 indicates that the capability > > > mask at the indicated LID has changed. > > > > > > > I still don't have IPoIB running, and ibv_devinfo says I'm not setup > > > > right either (couldn't open a device). > > > > > > I'm not sure why not. > > > > > > -- Hal > > > > > > > Thanks, > > > > > > > > Chris > > > > On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > > > Hi Chris, > > > > > > > > > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote: > > > > > > I installed the SuSE 10 OpenIB RC2 RPMS. > > > > > > > > > > > > The installation went well, but I'm stuck at the startup. > > > > > > > > > > > > As an IBGD user, I'm used to an init file in /etc/init.d... but > > > > > > there was none. > > > > > > > > > > > > >From the wiki, I was able to glean: > > > > > > > > > > > > Make the udev file: > > > > > > > > > > > > # cat > /etc/udev/rules.d/40-infiniband.rules > > > > > > KERNEL="umad*", NAME="infiniband/%k" > > > > > > KERNEL="issm*", NAME="infiniband/%k" > > > > > > > > > > > > Install some modules: > > > > > > > > > > > > modprobe ib_ucm > > > > > > modprobe ib_cm > > > > > > modprobe ib_uverbs > > > > > > modprobe ib_umad > > > > > > > > > > > > And make sure udev is running, and start the opensm. > > > > > > > > > > > > I've done this on all nodes, and ibstat shows I have a link up and > > > > > > running on every node. Opensm doesn't show any scanning. It's been > > > > > > hung all night at: > > > > > > > > > > > > # opensm --console > > > > > > - > > > > > > OpenSM Rev:openib-1.2.0 > > > > > > Based on OpenIB svn Exported revision > > > > > > Command Line Arguments: > > > > > > Enabling OpenSM interactive console > > > > > > Log File: /var/log/osm.log > > > > > > - > > > > > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > > > > > > > > > > > Using default guid 0x2c9020020c3ce > > > > > > > > > > > > OpenSM Console > > > > > > > > > > > > $ Entering MASTER state > > > > > > > > > > > > SUBNET UP > > > > > > > > > > Looks like everything is fine from the OpenSM standpoint. > > > > > > > > > > I see no indication that OpenSM is hung. You are in the console. > > > > > > > > > > Also, why do you say OpenSM isn't "scanning" ? > > > > > > > > > > What is in /var/log/osm.log ? Any errors ? > > > > > > > > > > If you want more verbose messages start OpenSM with -V. > > > > > > > > > > -- Hal > > > > > > > > > > > IPoIB isn't up. ibv_rc_pingpong doesn't work. Neither does > > > > > > ibv_devinfo. > > > > > > > > > > > > Is there a definitive guide on the initialization of the drivers > > > > > > and fabric? > > > > > > > > > > > > Also, is there an MVAPICH2 for SuSE 10 RPM? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Chris > > > > > > ___ > > > > > > openib-general mailing list > > > > > > openib-general@openib.org > > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > To unsubscribe, please visit > > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > ___ > > > > openib-general maili
Re: [openib-general][patch review] srp: fmr implementation,
Hmm, I don't understand what could be going on. srp_send_tsk_mgmt() currently has: if (req->cmd_done) { srp_remove_req(target, req, req_index); scmnd->scsi_done(scmnd); } else if (!req->tsk_status) { srp_remove_req(target, req, req_index); scmnd->result = DID_ABORT << 16; ret = SUCCESS; } and otherwise it returns FAILED. So in both cases where it finishes the command, it removes it from the list of pending requests. Are you absolutely sure you saw the crash with a patched driver that has that code in srp_send_tsk_mgmt()? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IB initialization
Hi again Chris, On Fri, 2006-04-14 at 11:29, Chris Worley wrote: > Hal, > > It looks like 1 per GUID. I don't see a capability mask. An example is: > > Apr 14 07:28:18 879428 [40602960] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x04 num:144 Producer:1 f > rom LID:0x0007 TID:0x0001 > Apr 14 07:28:18 879513 [40602960] -> osm_report_notice: Reporting > Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800 > 000,0x0002c9020020c3b6 Are you running with verbose (-V) ? You only see that extra info then. Just out of curiousity, how big is your subnet and what is the topology ? -- Hal > Thanks, > > Chris > On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > Hi again Chris, > > > > On Fri, 2006-04-14 at 10:39, Chris Worley wrote: > > > Hal, > > > > > > You're correct... the results of the scans are in /var/log/osm.log. I > > > was expecting the "-console" mode to show more. > > > > > > In looking at the /var/log/osm.log I'm seeing a lot of: > > > > > > Reporting Generic Notice type:4 num:144 > > > > > > For different GUIDs. > > > > What's a lot ? One for each GUID ? What's the capability mask indicated > > ? > > > > > Is there a place to look these up? > > > > Yes, the IBA spec (volume 1). Trap 144 indicates that the capability > > mask at the indicated LID has changed. > > > > > I still don't have IPoIB running, and ibv_devinfo says I'm not setup > > > right either (couldn't open a device). > > > > I'm not sure why not. > > > > -- Hal > > > > > Thanks, > > > > > > Chris > > > On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > > Hi Chris, > > > > > > > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote: > > > > > I installed the SuSE 10 OpenIB RC2 RPMS. > > > > > > > > > > The installation went well, but I'm stuck at the startup. > > > > > > > > > > As an IBGD user, I'm used to an init file in /etc/init.d... but there > > > > > was none. > > > > > > > > > > >From the wiki, I was able to glean: > > > > > > > > > > Make the udev file: > > > > > > > > > > # cat > /etc/udev/rules.d/40-infiniband.rules > > > > > KERNEL="umad*", NAME="infiniband/%k" > > > > > KERNEL="issm*", NAME="infiniband/%k" > > > > > > > > > > Install some modules: > > > > > > > > > > modprobe ib_ucm > > > > > modprobe ib_cm > > > > > modprobe ib_uverbs > > > > > modprobe ib_umad > > > > > > > > > > And make sure udev is running, and start the opensm. > > > > > > > > > > I've done this on all nodes, and ibstat shows I have a link up and > > > > > running on every node. Opensm doesn't show any scanning. It's been > > > > > hung all night at: > > > > > > > > > > # opensm --console > > > > > - > > > > > OpenSM Rev:openib-1.2.0 > > > > > Based on OpenIB svn Exported revision > > > > > Command Line Arguments: > > > > > Enabling OpenSM interactive console > > > > > Log File: /var/log/osm.log > > > > > - > > > > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > > > > > > > > > Using default guid 0x2c9020020c3ce > > > > > > > > > > OpenSM Console > > > > > > > > > > $ Entering MASTER state > > > > > > > > > > SUBNET UP > > > > > > > > Looks like everything is fine from the OpenSM standpoint. > > > > > > > > I see no indication that OpenSM is hung. You are in the console. > > > > > > > > Also, why do you say OpenSM isn't "scanning" ? > > > > > > > > What is in /var/log/osm.log ? Any errors ? > > > > > > > > If you want more verbose messages start OpenSM with -V. > > > > > > > > -- Hal > > > > > > > > > IPoIB isn't up. ibv_rc_pingpong doesn't work. Neither does > > > > > ibv_devinfo. > > > > > > > > > > Is there a definitive guide on the initialization of the drivers and > > > > > fabric? > > > > > > > > > > Also, is there an MVAPICH2 for SuSE 10 RPM? > > > > > > > > > > Thanks, > > > > > > > > > > Chris > > > > > ___ > > > > > openib-general mailing list > > > > > openib-general@openib.org > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > To unsubscribe, please visit > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > ___ > > > openib-general mailing list > > > openib-general@openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit h
Re: [openib-general][patch review] srp: fmr implementation,
Roland Dreier wrote: Hmm, it's clearly a use-after-free bug. Based on ip is at srp_reconnect_target+0x2b1/0x5c0 [ib_srp] can you guess where it is in the SRP driver or what it's accessing? Also this is happening because the connection is being reconnected, because SCSI commands are timing out. Do you have any idea why this is happening? What does the target see when this happens? It crashed in "cleared request queue" ie. list_for_each_entry(req, &target->req_queue, list) { req->scmnd->result = DID_RESET << 16; req->scmnd->scsi_done(req->scmnd); } Probably scsi command already freed thru abort; however, it's still in request queue Vu ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IB initialization
Hal, It looks like 1 per GUID. I don't see a capability mask. An example is: Apr 14 07:28:18 879428 [40602960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 f rom LID:0x0007 TID:0x0001 Apr 14 07:28:18 879513 [40602960] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x0007 GID:0xfe800 000,0x0002c9020020c3b6 Thanks, Chris On 14 Apr 2006 10:55:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > Hi again Chris, > > On Fri, 2006-04-14 at 10:39, Chris Worley wrote: > > Hal, > > > > You're correct... the results of the scans are in /var/log/osm.log. I > > was expecting the "-console" mode to show more. > > > > In looking at the /var/log/osm.log I'm seeing a lot of: > > > > Reporting Generic Notice type:4 num:144 > > > > For different GUIDs. > > What's a lot ? One for each GUID ? What's the capability mask indicated > ? > > > Is there a place to look these up? > > Yes, the IBA spec (volume 1). Trap 144 indicates that the capability > mask at the indicated LID has changed. > > > I still don't have IPoIB running, and ibv_devinfo says I'm not setup > > right either (couldn't open a device). > > I'm not sure why not. > > -- Hal > > > Thanks, > > > > Chris > > On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > Hi Chris, > > > > > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote: > > > > I installed the SuSE 10 OpenIB RC2 RPMS. > > > > > > > > The installation went well, but I'm stuck at the startup. > > > > > > > > As an IBGD user, I'm used to an init file in /etc/init.d... but there > > > > was none. > > > > > > > > >From the wiki, I was able to glean: > > > > > > > > Make the udev file: > > > > > > > > # cat > /etc/udev/rules.d/40-infiniband.rules > > > > KERNEL="umad*", NAME="infiniband/%k" > > > > KERNEL="issm*", NAME="infiniband/%k" > > > > > > > > Install some modules: > > > > > > > > modprobe ib_ucm > > > > modprobe ib_cm > > > > modprobe ib_uverbs > > > > modprobe ib_umad > > > > > > > > And make sure udev is running, and start the opensm. > > > > > > > > I've done this on all nodes, and ibstat shows I have a link up and > > > > running on every node. Opensm doesn't show any scanning. It's been > > > > hung all night at: > > > > > > > > # opensm --console > > > > - > > > > OpenSM Rev:openib-1.2.0 > > > > Based on OpenIB svn Exported revision > > > > Command Line Arguments: > > > > Enabling OpenSM interactive console > > > > Log File: /var/log/osm.log > > > > - > > > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > > > > > > > Using default guid 0x2c9020020c3ce > > > > > > > > OpenSM Console > > > > > > > > $ Entering MASTER state > > > > > > > > SUBNET UP > > > > > > Looks like everything is fine from the OpenSM standpoint. > > > > > > I see no indication that OpenSM is hung. You are in the console. > > > > > > Also, why do you say OpenSM isn't "scanning" ? > > > > > > What is in /var/log/osm.log ? Any errors ? > > > > > > If you want more verbose messages start OpenSM with -V. > > > > > > -- Hal > > > > > > > IPoIB isn't up. ibv_rc_pingpong doesn't work. Neither does > > > > ibv_devinfo. > > > > > > > > Is there a definitive guide on the initialization of the drivers and > > > > fabric? > > > > > > > > Also, is there an MVAPICH2 for SuSE 10 RPM? > > > > > > > > Thanks, > > > > > > > > Chris > > > > ___ > > > > openib-general mailing list > > > > openib-general@openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IB initialization
Hi again Chris, On Fri, 2006-04-14 at 10:39, Chris Worley wrote: > Hal, > > You're correct... the results of the scans are in /var/log/osm.log. I > was expecting the "-console" mode to show more. > > In looking at the /var/log/osm.log I'm seeing a lot of: > > Reporting Generic Notice type:4 num:144 > > For different GUIDs. What's a lot ? One for each GUID ? What's the capability mask indicated ? > Is there a place to look these up? Yes, the IBA spec (volume 1). Trap 144 indicates that the capability mask at the indicated LID has changed. > I still don't have IPoIB running, and ibv_devinfo says I'm not setup > right either (couldn't open a device). I'm not sure why not. -- Hal > Thanks, > > Chris > On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > Hi Chris, > > > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote: > > > I installed the SuSE 10 OpenIB RC2 RPMS. > > > > > > The installation went well, but I'm stuck at the startup. > > > > > > As an IBGD user, I'm used to an init file in /etc/init.d... but there was > > > none. > > > > > > >From the wiki, I was able to glean: > > > > > > Make the udev file: > > > > > > # cat > /etc/udev/rules.d/40-infiniband.rules > > > KERNEL="umad*", NAME="infiniband/%k" > > > KERNEL="issm*", NAME="infiniband/%k" > > > > > > Install some modules: > > > > > > modprobe ib_ucm > > > modprobe ib_cm > > > modprobe ib_uverbs > > > modprobe ib_umad > > > > > > And make sure udev is running, and start the opensm. > > > > > > I've done this on all nodes, and ibstat shows I have a link up and > > > running on every node. Opensm doesn't show any scanning. It's been > > > hung all night at: > > > > > > # opensm --console > > > - > > > OpenSM Rev:openib-1.2.0 > > > Based on OpenIB svn Exported revision > > > Command Line Arguments: > > > Enabling OpenSM interactive console > > > Log File: /var/log/osm.log > > > - > > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > > > > > Using default guid 0x2c9020020c3ce > > > > > > OpenSM Console > > > > > > $ Entering MASTER state > > > > > > SUBNET UP > > > > Looks like everything is fine from the OpenSM standpoint. > > > > I see no indication that OpenSM is hung. You are in the console. > > > > Also, why do you say OpenSM isn't "scanning" ? > > > > What is in /var/log/osm.log ? Any errors ? > > > > If you want more verbose messages start OpenSM with -V. > > > > -- Hal > > > > > IPoIB isn't up. ibv_rc_pingpong doesn't work. Neither does ibv_devinfo. > > > > > > Is there a definitive guide on the initialization of the drivers and > > > fabric? > > > > > > Also, is there an MVAPICH2 for SuSE 10 RPM? > > > > > > Thanks, > > > > > > Chris > > > ___ > > > openib-general mailing list > > > openib-general@openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c::__osm_mcmr_rcv_respond: Fix MTU, rate, and PLL selectors
OpenSM/osm_sa_mcmember_record.c::__osm_mcmr_rcv_respond: Fix MTU, rate, and PLL selectors to be exactly Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]> --- Note this patch has been applied to both trunk and 1.0 branch. Index: opensm/osm_sa_mcmember_record.c === --- opensm/osm_sa_mcmember_record.c (revision 6466) +++ opensm/osm_sa_mcmember_record.c (working copy) @@ -548,8 +548,11 @@ __osm_mcmr_rcv_respond( *p_resp_mcmember_rec = *p_mcmember_rec; /* Fill in the mtu, rate, and packet lifetime selectors */ + p_resp_mcmember_rec->mtu &= 0x3f; p_resp_mcmember_rec->mtu |= 2<<6; /* exactly */ + p_resp_mcmember_rec->rate &= 0x3f; p_resp_mcmember_rec->rate |= 2<<6; /* exactly */ + p_resp_mcmember_rec->pkt_life &= 0x3f; p_resp_mcmember_rec->pkt_life |= 2<<6; /* exactly */ status = osm_vendor_send( ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IB initialization
Hal, You're correct... the results of the scans are in /var/log/osm.log. I was expecting the "-console" mode to show more. In looking at the /var/log/osm.log I'm seeing a lot of: Reporting Generic Notice type:4 num:144 For different GUIDs. Is there a place to look these up? I still don't have IPoIB running, and ibv_devinfo says I'm not setup right either (couldn't open a device). Thanks, Chris On 14 Apr 2006 10:22:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > Hi Chris, > > On Fri, 2006-04-14 at 10:19, Chris Worley wrote: > > I installed the SuSE 10 OpenIB RC2 RPMS. > > > > The installation went well, but I'm stuck at the startup. > > > > As an IBGD user, I'm used to an init file in /etc/init.d... but there was > > none. > > > > >From the wiki, I was able to glean: > > > > Make the udev file: > > > > # cat > /etc/udev/rules.d/40-infiniband.rules > > KERNEL="umad*", NAME="infiniband/%k" > > KERNEL="issm*", NAME="infiniband/%k" > > > > Install some modules: > > > > modprobe ib_ucm > > modprobe ib_cm > > modprobe ib_uverbs > > modprobe ib_umad > > > > And make sure udev is running, and start the opensm. > > > > I've done this on all nodes, and ibstat shows I have a link up and > > running on every node. Opensm doesn't show any scanning. It's been > > hung all night at: > > > > # opensm --console > > - > > OpenSM Rev:openib-1.2.0 > > Based on OpenIB svn Exported revision > > Command Line Arguments: > > Enabling OpenSM interactive console > > Log File: /var/log/osm.log > > - > > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > > > Using default guid 0x2c9020020c3ce > > > > OpenSM Console > > > > $ Entering MASTER state > > > > SUBNET UP > > Looks like everything is fine from the OpenSM standpoint. > > I see no indication that OpenSM is hung. You are in the console. > > Also, why do you say OpenSM isn't "scanning" ? > > What is in /var/log/osm.log ? Any errors ? > > If you want more verbose messages start OpenSM with -V. > > -- Hal > > > IPoIB isn't up. ibv_rc_pingpong doesn't work. Neither does ibv_devinfo. > > > > Is there a definitive guide on the initialization of the drivers and fabric? > > > > Also, is there an MVAPICH2 for SuSE 10 RPM? > > > > Thanks, > > > > Chris > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IB initialization
Hi Chris, On Fri, 2006-04-14 at 10:19, Chris Worley wrote: > I installed the SuSE 10 OpenIB RC2 RPMS. > > The installation went well, but I'm stuck at the startup. > > As an IBGD user, I'm used to an init file in /etc/init.d... but there was > none. > > >From the wiki, I was able to glean: > > Make the udev file: > > # cat > /etc/udev/rules.d/40-infiniband.rules > KERNEL="umad*", NAME="infiniband/%k" > KERNEL="issm*", NAME="infiniband/%k" > > Install some modules: > > modprobe ib_ucm > modprobe ib_cm > modprobe ib_uverbs > modprobe ib_umad > > And make sure udev is running, and start the opensm. > > I've done this on all nodes, and ibstat shows I have a link up and > running on every node. Opensm doesn't show any scanning. It's been > hung all night at: > > # opensm --console > - > OpenSM Rev:openib-1.2.0 > Based on OpenIB svn Exported revision > Command Line Arguments: > Enabling OpenSM interactive console > Log File: /var/log/osm.log > - > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > Using default guid 0x2c9020020c3ce > > OpenSM Console > > $ Entering MASTER state > > SUBNET UP Looks like everything is fine from the OpenSM standpoint. I see no indication that OpenSM is hung. You are in the console. Also, why do you say OpenSM isn't "scanning" ? What is in /var/log/osm.log ? Any errors ? If you want more verbose messages start OpenSM with -V. -- Hal > IPoIB isn't up. ibv_rc_pingpong doesn't work. Neither does ibv_devinfo. > > Is there a definitive guide on the initialization of the drivers and fabric? > > Also, is there an MVAPICH2 for SuSE 10 RPM? > > Thanks, > > Chris > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] IB initialization
I installed the SuSE 10 OpenIB RC2 RPMS. The installation went well, but I'm stuck at the startup. As an IBGD user, I'm used to an init file in /etc/init.d... but there was none. >From the wiki, I was able to glean: Make the udev file: # cat > /etc/udev/rules.d/40-infiniband.rules KERNEL="umad*", NAME="infiniband/%k" KERNEL="issm*", NAME="infiniband/%k" Install some modules: modprobe ib_ucm modprobe ib_cm modprobe ib_uverbs modprobe ib_umad And make sure udev is running, and start the opensm. I've done this on all nodes, and ibstat shows I have a link up and running on every node. Opensm doesn't show any scanning. It's been hung all night at: # opensm --console - OpenSM Rev:openib-1.2.0 Based on OpenIB svn Exported revision Command Line Arguments: Enabling OpenSM interactive console Log File: /var/log/osm.log - OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision Using default guid 0x2c9020020c3ce OpenSM Console $ Entering MASTER state SUBNET UP IPoIB isn't up. ibv_rc_pingpong doesn't work. Neither does ibv_devinfo. Is there a definitive guide on the initialization of the drivers and fabric? Also, is there an MVAPICH2 for SuSE 10 RPM? Thanks, Chris ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general