Re: [openib-general] uDAPL problem
Stephen Smaldone wrote: > > > Arlin Davis wrote: > >> Steve Smaldone wrote: >> >>> Hi, >>> >>> Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm >>> device appears. However, it now fails with the following: >>> >>> $ ./dapltest -T S -D IB1 >>> ... >>> DAT Registry: dat_ia_openv (IB1,1:2,0) called >>> DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so >>> DAT Registry: dat_registry_add_provider (IB1,1:2,0) >>> libibverbs: Warning: no userspace device-specific driver found for >>> uverbs0 >>>driver search path: /usr/local/lib/infiniband >>> libibverbs: Warning: no userspace device-specific driver found for >>> uverbs0 >>>driver search path: /usr/local/lib/infiniband >>> DT_cs_Server: Could not open IB1 (DAT_INVALID_ADDRESS ) >>> DT_cs_Server (IB1): Exiting. >>> DAT Registry: Stopped (dat_fini) >>> >>> The configuration remains the same otherwise. >>> >>> My dat.conf: IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so mv_dapl.1.2 "hora-1-ib0 0" "" >>> >> Do you have an entry in your /etc/hosts for hora-1-ib0 and 10.2.2.135? >> >> there seems to be problems resolving "hora-1-ib0" >> >> -arlin > > Yes. There is an entry as follows: > 10.2.2.135 hora-1-ib0 could you change the "hora-1-ib0 0" to just "ib0 0" in your dat.conf and retry? They may be an issue parsing a hostname instead of a netdev name. > > Thanks, > > Steve > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
Steve Wise wrote: > On Mon, 2006-10-16 at 18:01 -0400, Steve Smaldone wrote: > >> Hi, >> >> Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm >> device appears. However, it now fails with the following: >> >> $ ./dapltest -T S -D IB1 >> ... >> DAT Registry: dat_ia_openv (IB1,1:2,0) called >> DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so >> DAT Registry: dat_registry_add_provider (IB1,1:2,0) >> libibverbs: Warning: no userspace device-specific driver found for uverbs0 >> driver search path: /usr/local/lib/infiniband >> libibverbs: Warning: no userspace device-specific driver found for uverbs0 >> driver search path: /usr/local/lib/infiniband >> > > Seems like it cannot find the provider library. > > Is there a libmthca.* in /usr/local/lib/infiniband? > > > > That was the problem. There was no mthca.so. Now it works. Thanks for the help! Steve Smaldone ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
On Mon, 2006-10-16 at 18:01 -0400, Steve Smaldone wrote: > Hi, > > Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm > device appears. However, it now fails with the following: > > $ ./dapltest -T S -D IB1 > ... > DAT Registry: dat_ia_openv (IB1,1:2,0) called > DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so > DAT Registry: dat_registry_add_provider (IB1,1:2,0) > libibverbs: Warning: no userspace device-specific driver found for uverbs0 > driver search path: /usr/local/lib/infiniband > libibverbs: Warning: no userspace device-specific driver found for uverbs0 > driver search path: /usr/local/lib/infiniband Seems like it cannot find the provider library. Is there a libmthca.* in /usr/local/lib/infiniband? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
Arlin Davis wrote: > Steve Smaldone wrote: > >> Hi, >> >> Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm >> device appears. However, it now fails with the following: >> >> $ ./dapltest -T S -D IB1 >> ... >> DAT Registry: dat_ia_openv (IB1,1:2,0) called >> DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so >> DAT Registry: dat_registry_add_provider (IB1,1:2,0) >> libibverbs: Warning: no userspace device-specific driver found for >> uverbs0 >>driver search path: /usr/local/lib/infiniband >> libibverbs: Warning: no userspace device-specific driver found for >> uverbs0 >>driver search path: /usr/local/lib/infiniband >> DT_cs_Server: Could not open IB1 (DAT_INVALID_ADDRESS ) >> DT_cs_Server (IB1): Exiting. >> DAT Registry: Stopped (dat_fini) >> >> The configuration remains the same otherwise. >> >> >>> My dat.conf: >>> IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so mv_dapl.1.2 >>> "hora-1-ib0 0" "" >>> > Do you have an entry in your /etc/hosts for hora-1-ib0 and 10.2.2.135? > > there seems to be problems resolving "hora-1-ib0" > > -arlin Yes. There is an entry as follows: 10.2.2.135 hora-1-ib0 Thanks, Steve ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
Steve Smaldone wrote: >Hi, > >Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm >device appears. However, it now fails with the following: > >$ ./dapltest -T S -D IB1 >... >DAT Registry: dat_ia_openv (IB1,1:2,0) called >DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so >DAT Registry: dat_registry_add_provider (IB1,1:2,0) >libibverbs: Warning: no userspace device-specific driver found for uverbs0 >driver search path: /usr/local/lib/infiniband >libibverbs: Warning: no userspace device-specific driver found for uverbs0 >driver search path: /usr/local/lib/infiniband >DT_cs_Server: Could not open IB1 (DAT_INVALID_ADDRESS ) >DT_cs_Server (IB1): Exiting. >DAT Registry: Stopped (dat_fini) > >The configuration remains the same otherwise. > > >>My dat.conf: >>IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so mv_dapl.1.2 >>"hora-1-ib0 0" "" >> >> Do you have an entry in your /etc/hosts for hora-1-ib0 and 10.2.2.135? there seems to be problems resolving "hora-1-ib0" -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
Hi, Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm device appears. However, it now fails with the following: $ ./dapltest -T S -D IB1 ... DAT Registry: dat_ia_openv (IB1,1:2,0) called DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so DAT Registry: dat_registry_add_provider (IB1,1:2,0) libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /usr/local/lib/infiniband libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /usr/local/lib/infiniband DT_cs_Server: Could not open IB1 (DAT_INVALID_ADDRESS ) DT_cs_Server (IB1): Exiting. DAT Registry: Stopped (dat_fini) The configuration remains the same otherwise. Thanks, Steve Smaldone Steve Smaldone wrote: > Hi, > > I have been trying to run dapltest with trunk rev 9717 with linux > kernel 2.6.18 and I get an error. The error and configuration is > shown below. > Basically, the rdma_cm device is not created under /dev/infiniband. I > am wondering if this is a known problem and how to solve it. > > Thanks, > Steve Smaldone > > $ ./dapltest -T S -D IB1 > ... > DAT Registry: dat_ia_openv (IB1,1:2,0) called > DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so > DAT Registry: dat_registry_add_provider (IB1,1:2,0) > librdmacm: couldn't read ABI version. > librdmacm: assuming: 2 > libibverbs: Warning: no userspace device-specific driver found for > uverbs0 >driver search path: /usr/local/lib/infiniband > CMA: unable to open /dev/infiniband/rdma_cm > DT_cs_Server: Could not open IB1 (DAT_INTERNAL_ERROR ) > DT_cs_Server (IB1): Exiting. > DAT Registry: Stopped (dat_fini) > > My dat.conf: > IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so mv_dapl.1.2 > "hora-1-ib0 0" "" > > ifconfig: > ib0 Link encap:UNSPEC HWaddr > 00-00-00-14-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:10.2.2.135 Bcast:10.255.255.255 Mask:255.0.0.0 > inet6 addr: fe80::202:c901:81e:90e1/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1 errors:0 dropped:5 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:0 (0.0 b) TX bytes:68 (68.0 b) > > lsmod: > Module Size Used by > ib_ucm 20612 0 > ib_uverbs 40232 1 ib_ucm > rdma_cm33572 0 > ib_cm 39824 2 ib_ucm,rdma_cm > ib_addr11524 1 rdma_cm > ib_local_sa15752 1 rdma_cm > findex 8576 1 ib_local_sa > ib_ipoib 51144 0 > ib_multicast 15940 2 rdma_cm,ib_ipoib > ib_sa 20004 5 > rdma_cm,ib_cm,ib_local_sa,ib_ipoib,ib_multicast > ib_umad20016 0 > ib_mthca 131236 0 > ib_mad 42272 5 ib_cm,ib_local_sa,ib_sa,ib_umad,ib_mthca > ib_core52992 11 > ib_ucm,ib_uverbs,rdma_cm,ib_cm,ib_local_sa,ib_ipoib,ib_multicast,ib_sa,ib_umad,ib_mthca,ib_mad > > > > udev rules: > KERNEL="umad*", NAME="infiniband/%k" > KERNEL="issm*", NAME="infiniband/%k" > KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" > KERNEL="ucm*", NAME="infiniband/%k", MODE="0666" > KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" > > /dev/infiniband: > issm0 issm1 ucm0 umad0 umad1 uverbs0 > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
Steve Smaldone wrote: > Hi, > > I have been trying to run dapltest with trunk rev 9717 with linux kernel > 2.6.18 and I get an error. The error and configuration is shown below. > Basically, the rdma_cm device is not created under /dev/infiniband. I > am wondering if this is a known problem and how to solve it. You need to load the rdma_ucm module, too. Regards, Robert. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] uDAPL problem
Hi, I have been trying to run dapltest with trunk rev 9717 with linux kernel 2.6.18 and I get an error. The error and configuration is shown below. Basically, the rdma_cm device is not created under /dev/infiniband. I am wondering if this is a known problem and how to solve it. Thanks, Steve Smaldone $ ./dapltest -T S -D IB1 ... DAT Registry: dat_ia_openv (IB1,1:2,0) called DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so DAT Registry: dat_registry_add_provider (IB1,1:2,0) librdmacm: couldn't read ABI version. librdmacm: assuming: 2 libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /usr/local/lib/infiniband CMA: unable to open /dev/infiniband/rdma_cm DT_cs_Server: Could not open IB1 (DAT_INTERNAL_ERROR ) DT_cs_Server (IB1): Exiting. DAT Registry: Stopped (dat_fini) My dat.conf: IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so mv_dapl.1.2 "hora-1-ib0 0" "" ifconfig: ib0 Link encap:UNSPEC HWaddr 00-00-00-14-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.2.2.135 Bcast:10.255.255.255 Mask:255.0.0.0 inet6 addr: fe80::202:c901:81e:90e1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:1 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:68 (68.0 b) lsmod: Module Size Used by ib_ucm 20612 0 ib_uverbs 40232 1 ib_ucm rdma_cm33572 0 ib_cm 39824 2 ib_ucm,rdma_cm ib_addr11524 1 rdma_cm ib_local_sa15752 1 rdma_cm findex 8576 1 ib_local_sa ib_ipoib 51144 0 ib_multicast 15940 2 rdma_cm,ib_ipoib ib_sa 20004 5 rdma_cm,ib_cm,ib_local_sa,ib_ipoib,ib_multicast ib_umad20016 0 ib_mthca 131236 0 ib_mad 42272 5 ib_cm,ib_local_sa,ib_sa,ib_umad,ib_mthca ib_core52992 11 ib_ucm,ib_uverbs,rdma_cm,ib_cm,ib_local_sa,ib_ipoib,ib_multicast,ib_sa,ib_umad,ib_mthca,ib_mad udev rules: KERNEL="umad*", NAME="infiniband/%k" KERNEL="issm*", NAME="infiniband/%k" KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" KERNEL="ucm*", NAME="infiniband/%k", MODE="0666" KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" /dev/infiniband: issm0 issm1 ucm0 umad0 umad1 uverbs0 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
On Tue, 27 Sep 2005, Todd Bowman wrote: > Never mind, I found the post from Roland discussing this issue: > > Roland wrote: > I didn't try to fix uDAPL, because some thought probably needs to go > into how to use completion channels most efficiently. I appologize. I forgot that the current tree has an ABI change. Arlin is working on a fix for this. The kernel and userspace code at revision 3547 is what you need. Please use svn co -r 3547 https://openib.org/svn/gen2/trunk/src/ to obtain these sources. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
On 9/27/05, Todd Bowman <[EMAIL PROTECTED]> wrote: On 9/27/05, James Lentini < [EMAIL PROTECTED]> wrote: On Tue, 27 Sep 2005, Todd Bowman wrote:> On 9/27/05, James Lentini <[EMAIL PROTECTED] > wrote:>> What is the output of cat /sys/class/net/*/ifindex? > >> > cat /sys/class/net/*/ifindex> 1 #eth0> 10 #ib0> 11 #ib1> 2 #lo> 3 #tunl0This looks like the problem I reported last week: http://openib.org/pipermail/openib-general/2005-September/011668.htmlIf so, this is fixed in the current subversion sources. Could youupdate your sources, specifically infiniband/core/at.c, and try again? james That worked. Thanks. It is printing the addresses backwards. But I assume this is also a bug in printing. ips_by_gid: RET 0 at_rec 0x19e5f40 -> id 1 dapli_at_event_cb() ip_comp_handler: rec 0x19e5f40 ->id 1 id 1 num 1 b0f0add open_hca: mthca0, port 1, AF_INET 221.10.15.11 INLINE_MAX=128 query_hca: mthca0 AF_INET 221.10.15.11 query_hca: (0.30002) ep 64512 ep_q 65535 evd 65408 evd_q 65535 query_hca: msg 2147483648 rdma 2147483648 iov 28 lmr 131056 rmr 0 Bus error The bus error is probably due to changes in ibv_context. I didn't recompile libdapl. The changes in ibv_context that broke udapl are: num_comp is now num_comp_vectors I don't know what is used in place of cq_fd[1]. Any pointers? Todd Never mind, I found the post from Roland discussing this issue: Roland wrote: I didn't try to fix uDAPL, because some thought probably needs to go into how to use completion channels most efficiently. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
On 9/27/05, James Lentini <[EMAIL PROTECTED]> wrote: On Tue, 27 Sep 2005, Todd Bowman wrote:> On 9/27/05, James Lentini <[EMAIL PROTECTED]> wrote:>> What is the output of cat /sys/class/net/*/ifindex? > >> > cat /sys/class/net/*/ifindex> 1 #eth0> 10 #ib0> 11 #ib1> 2 #lo> 3 #tunl0This looks like the problem I reported last week: http://openib.org/pipermail/openib-general/2005-September/011668.htmlIf so, this is fixed in the current subversion sources. Could youupdate your sources, specifically infiniband/core/at.c, and try again? james That worked. Thanks. It is printing the addresses backwards. But I assume this is also a bug in printing. ips_by_gid: RET 0 at_rec 0x19e5f40 -> id 1 dapli_at_event_cb() ip_comp_handler: rec 0x19e5f40 ->id 1 id 1 num 1 b0f0add open_hca: mthca0, port 1, AF_INET 221.10.15.11 INLINE_MAX=128 query_hca: mthca0 AF_INET 221.10.15.11 query_hca: (0.30002) ep 64512 ep_q 65535 evd 65408 evd_q 65535 query_hca: msg 2147483648 rdma 2147483648 iov 28 lmr 131056 rmr 0 Bus error The bus error is probably due to changes in ibv_context. I didn't recompile libdapl. The changes in ibv_context that broke udapl are: num_comp is now num_comp_vectors I don't know what is used in place of cq_fd[1]. Any pointers? Todd ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
On Tue, 27 Sep 2005, Todd Bowman wrote: > > I am having a different problem in ips_by_gid: > > open_hca: Found dev mthca0 f42202c90200 > open_hca: GID subnet 80fe id f52202c90200 > ips_by_gid: ERR ips_by_gid -1 No such device > open_hca: ERR ib_at_ips_by_gid for mthca0 > dapls_ib_open_hca failed 4 > dapl_ia_open () returns 0x4 > DT_cs_Server: Could not open OpenIB-ib0 (DAT_INTERNAL_ERROR ) What is the output of ifconfig? Is the IPoIB interface configured? What is the output of cat /sys/class/net/*/ifindex? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
On 27 Sep 2005 09:55:02 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: On Tue, 2005-09-27 at 09:51, James Lentini wrote:> On Mon, 26 Sep 2005, Hal Rosenstock wrote:>> > On Mon, 2005-09-26 at 18:05, Todd Bowman wrote:> > > I am having a problem with uDAPL accessing > > > /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with> > > backport. Here is a snippet of the uDAPL debug messages running> > > dtest. The dat.conf file seems to be correct, the correclty named > > > providers are being loaded.> > >> > > 26248 Running as server> > > DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called> > > DAT Registry: IA OpenIB-ib0, trying to load library > > > /usr/local/lib/libdapl.so> > > libuat: Error <-1:6> couldn't open IB at device > > > libibcm: error <-1:6> opening device >> This means that the /dev entried are not setup correctly.Correct. He set this up manually. Todd wrote:"I am not running udev but manually create uat and ucm." The correct major/minor #s fixed that problem. > > > DAPL: NOT Setting Loopback> > > dapl_ib_init:> > > DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) > > > dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0)> > > open_hca: mthca0 - 0x1001fdb0> > > open_hca: Found dev mthca0 f42202c90200> > > open_hca: GID subnet 80fe id f52202c90200 > >> > These look like they need to be endianized to me.>> This looks like a bug in the way we print these values out, but I> don't think it is the real problem.Right, it's just a cosmetic with the display. -- Hal> What architecture are you using? Apple G5. >> > > ips_by_gid: ERR ips_by_gid -1 Bad file descriptor> > > open_hca: ERR ib_at_ips_by_gid for mthca0 > > > dapls_ib_open_hca failed 4> > > dapl_ia_open () returns 0x4> > > 26248: Error Adaptor open: DAT_INTERNAL_ERROR> > > DAT Registry: Stopped (dat_fini)> > > DAPL: Stopped (dapl_fini) > > > dapl_ib_release:> > >> > >> > > I am not running udev but manually create uat and ucm. Here is the > > > list of /dev/infiniband:> > >> > > ls -l /dev/infiniband/> > > total 0> > > crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0> > > crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 > > > crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat> >> > uat is at 231/191.> >> > > crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm> >> > I don't think you need this. > >> > > crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0> >> > ucm devices start at 231/224.>> If these changes do not fix you problem, please let us know.> > > -- Hal> >> > > crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0> > > crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1> > > crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 > > > crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1> > >> > >> > > And the loaded modules:> > >> > > kdapl_ib 82000 0 > > > kdapl 14888 1 kdapl_ib> > > ib_uverbs 52064 0> > > ib_ipoib 65480 0> > > ib_ucm 32624 0> > > ib_cm 51944 2 kdapl_ib,ib_ucm> > > ib_uat 22168 0> > > ib_at 34840 2 kdapl_ib,ib_uat> > > ib_sa 25328 2 ib_ipoib,ib_at> > > ib_mthca 160376 0> > > ib_mad 61108 3 ib_cm,ib_sa,ib_mthca> > > ib_core73888 8> > > kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad> > >> > >> > > I am sure that I am missing something simple. Can someone point me in > > > the right direction.> > >> > > Thanks,> > > ToddI am having a different problem in ips_by_gid: open_hca: Found dev mthca0 f42202c90200 open_hca: GID subnet 80fe id f52202c90200 ips_by_gid: ERR ips_by_gid -1 No such device open_hca: ERR ib_at_ips_by_gid for mthca0 dapls_ib_open_hca failed 4 dapl_ia_open () returns 0x4 DT_cs_Server: Could not open OpenIB-ib0 (DAT_INTERNAL_ERROR ) Thanks, Todd ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
On Tue, 2005-09-27 at 09:51, James Lentini wrote: > On Mon, 26 Sep 2005, Hal Rosenstock wrote: > > > On Mon, 2005-09-26 at 18:05, Todd Bowman wrote: > > > I am having a problem with uDAPL accessing > > > /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with > > > backport. Here is a snippet of the uDAPL debug messages running > > > dtest. The dat.conf file seems to be correct, the correclty named > > > providers are being loaded. > > > > > > 26248 Running as server > > > DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called > > > DAT Registry: IA OpenIB-ib0, trying to load library > > > /usr/local/lib/libdapl.so > > > libuat: Error <-1:6> couldn't open IB at device > > > libibcm: error <-1:6> opening device > > This means that the /dev entried are not setup correctly. Correct. He set this up manually. Todd wrote: "I am not running udev but manually create uat and ucm." > > > DAPL: NOT Setting Loopback > > > dapl_ib_init: > > > DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) > > > dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0) > > > open_hca: mthca0 - 0x1001fdb0 > > > open_hca: Found dev mthca0 f42202c90200 > > > open_hca: GID subnet 80fe id f52202c90200 > > > > These look like they need to be endianized to me. > > This looks like a bug in the way we print these values out, but I > don't think it is the real problem. Right, it's just a cosmetic with the display. -- Hal > What architecture are you using? > > > > ips_by_gid: ERR ips_by_gid -1 Bad file descriptor > > > open_hca: ERR ib_at_ips_by_gid for mthca0 > > > dapls_ib_open_hca failed 4 > > > dapl_ia_open () returns 0x4 > > > 26248: Error Adaptor open: DAT_INTERNAL_ERROR > > > DAT Registry: Stopped (dat_fini) > > > DAPL: Stopped (dapl_fini) > > > dapl_ib_release: > > > > > > > > > I am not running udev but manually create uat and ucm. Here is the > > > list of /dev/infiniband: > > > > > > ls -l /dev/infiniband/ > > > total 0 > > > crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0 > > > crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 > > > crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat > > > > uat is at 231/191. > > > > > crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm > > > > I don't think you need this. > > > > > crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0 > > > > ucm devices start at 231/224. > > If these changes do not fix you problem, please let us know. > > > -- Hal > > > > > crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0 > > > crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1 > > > crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 > > > crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1 > > > > > > > > > And the loaded modules: > > > > > > kdapl_ib 82000 0 > > > kdapl 14888 1 kdapl_ib > > > ib_uverbs 52064 0 > > > ib_ipoib 65480 0 > > > ib_ucm 32624 0 > > > ib_cm 51944 2 kdapl_ib,ib_ucm > > > ib_uat 22168 0 > > > ib_at 34840 2 kdapl_ib,ib_uat > > > ib_sa 25328 2 ib_ipoib,ib_at > > > ib_mthca 160376 0 > > > ib_mad 61108 3 ib_cm,ib_sa,ib_mthca > > > ib_core73888 8 > > > kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad > > > > > > > > > I am sure that I am missing something simple. Can someone point me in > > > the right direction. > > > > > > Thanks, > > > Todd ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
On Mon, 26 Sep 2005, Hal Rosenstock wrote: > On Mon, 2005-09-26 at 18:05, Todd Bowman wrote: > > I am having a problem with uDAPL accessing > > /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with > > backport. Here is a snippet of the uDAPL debug messages running > > dtest. The dat.conf file seems to be correct, the correclty named > > providers are being loaded. > > > > 26248 Running as server > > DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called > > DAT Registry: IA OpenIB-ib0, trying to load library > > /usr/local/lib/libdapl.so > > libuat: Error <-1:6> couldn't open IB at device > > libibcm: error <-1:6> opening device This means that the /dev entried are not setup correctly. > > DAPL: NOT Setting Loopback > > dapl_ib_init: > > DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) > > dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0) > > open_hca: mthca0 - 0x1001fdb0 > > open_hca: Found dev mthca0 f42202c90200 > > open_hca: GID subnet 80fe id f52202c90200 > > These look like they need to be endianized to me. This looks like a bug in the way we print these values out, but I don't think it is the real problem. What architecture are you using? > > ips_by_gid: ERR ips_by_gid -1 Bad file descriptor > > open_hca: ERR ib_at_ips_by_gid for mthca0 > > dapls_ib_open_hca failed 4 > > dapl_ia_open () returns 0x4 > > 26248: Error Adaptor open: DAT_INTERNAL_ERROR > > DAT Registry: Stopped (dat_fini) > > DAPL: Stopped (dapl_fini) > > dapl_ib_release: > > > > > > I am not running udev but manually create uat and ucm. Here is the > > list of /dev/infiniband: > > > > ls -l /dev/infiniband/ > > total 0 > > crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0 > > crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 > > crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat > > uat is at 231/191. > > > crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm > > I don't think you need this. > > > crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0 > > ucm devices start at 231/224. If these changes do not fix you problem, please let us know. > -- Hal > > > crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0 > > crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1 > > crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 > > crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1 > > > > > > And the loaded modules: > > > > kdapl_ib 82000 0 > > kdapl 14888 1 kdapl_ib > > ib_uverbs 52064 0 > > ib_ipoib 65480 0 > > ib_ucm 32624 0 > > ib_cm 51944 2 kdapl_ib,ib_ucm > > ib_uat 22168 0 > > ib_at 34840 2 kdapl_ib,ib_uat > > ib_sa 25328 2 ib_ipoib,ib_at > > ib_mthca 160376 0 > > ib_mad 61108 3 ib_cm,ib_sa,ib_mthca > > ib_core73888 8 > > kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad > > > > > > I am sure that I am missing something simple. Can someone point me in > > the right direction. > > > > Thanks, > > Todd ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] uDAPL problem
On Mon, 2005-09-26 at 18:05, Todd Bowman wrote: > I am having a problem with uDAPL accessing > /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with > backport. Here is a snippet of the uDAPL debug messages running > dtest. The dat.conf file seems to be correct, the correclty named > providers are being loaded. > > 26248 Running as server > DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called > DAT Registry: IA OpenIB-ib0, trying to load library > /usr/local/lib/libdapl.so > libuat: Error <-1:6> couldn't open IB at device > libibcm: error <-1:6> opening device > DAPL: NOT Setting Loopback > dapl_ib_init: > DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) > dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0) > open_hca: mthca0 - 0x1001fdb0 > open_hca: Found dev mthca0 f42202c90200 > open_hca: GID subnet 80fe id f52202c90200 These look like they need to be endianized to me. > ips_by_gid: ERR ips_by_gid -1 Bad file descriptor > open_hca: ERR ib_at_ips_by_gid for mthca0 > dapls_ib_open_hca failed 4 > dapl_ia_open () returns 0x4 > 26248: Error Adaptor open: DAT_INTERNAL_ERROR > DAT Registry: Stopped (dat_fini) > DAPL: Stopped (dapl_fini) > dapl_ib_release: > > > I am not running udev but manually create uat and ucm. Here is the > list of /dev/infiniband: > > ls -l /dev/infiniband/ > total 0 > crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0 > crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 > crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat uat is at 231/191. > crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm I don't think you need this. > crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0 ucm devices start at 231/224. -- Hal > crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0 > crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1 > crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 > crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1 > > > And the loaded modules: > > kdapl_ib 82000 0 > kdapl 14888 1 kdapl_ib > ib_uverbs 52064 0 > ib_ipoib 65480 0 > ib_ucm 32624 0 > ib_cm 51944 2 kdapl_ib,ib_ucm > ib_uat 22168 0 > ib_at 34840 2 kdapl_ib,ib_uat > ib_sa 25328 2 ib_ipoib,ib_at > ib_mthca 160376 0 > ib_mad 61108 3 ib_cm,ib_sa,ib_mthca > ib_core73888 8 > kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad > > > I am sure that I am missing something simple. Can someone point me in > the right direction. > > Thanks, > Todd > > > __ > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] uDAPL problem
I am having a problem with uDAPL accessing /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with backport. Here is a snippet of the uDAPL debug messages running dtest. The dat.conf file seems to be correct, the correclty named providers are being loaded. 26248 Running as server DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called DAT Registry: IA OpenIB-ib0, trying to load library /usr/local/lib/libdapl.so libuat: Error <-1:6> couldn't open IB at device libibcm: error <-1:6> opening device DAPL: NOT Setting Loopback dapl_ib_init: DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0) open_hca: mthca0 - 0x1001fdb0 open_hca: Found dev mthca0 f42202c90200 open_hca: GID subnet 80fe id f52202c90200 ips_by_gid: ERR ips_by_gid -1 Bad file descriptor open_hca: ERR ib_at_ips_by_gid for mthca0 dapls_ib_open_hca failed 4 dapl_ia_open () returns 0x4 26248: Error Adaptor open: DAT_INTERNAL_ERROR DAT Registry: Stopped (dat_fini) DAPL: Stopped (dapl_fini) dapl_ib_release: I am not running udev but manually create uat and ucm. Here is the list of /dev/infiniband: ls -l /dev/infiniband/ total 0 crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0 crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0 crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0 crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1 crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1 And the loaded modules: kdapl_ib 82000 0 kdapl 14888 1 kdapl_ib ib_uverbs 52064 0 ib_ipoib 65480 0 ib_ucm 32624 0 ib_cm 51944 2 kdapl_ib,ib_ucm ib_uat 22168 0 ib_at 34840 2 kdapl_ib,ib_uat ib_sa 25328 2 ib_ipoib,ib_at ib_mthca 160376 0 ib_mad 61108 3 ib_cm,ib_sa,ib_mthca ib_core 73888 8 kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad I am sure that I am missing something simple. Can someone point me in the right direction. Thanks, Todd ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general