problem remained same, when I run lctl ping with tcpdump 4.0.0 I dont see any activity on ib0 !
another exhaustive Lustre debug log I took with lctl ping do you see any problem with it ? Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(module.c:160:libcfs_psdev_open()) Process entered Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(module.c:164:libcfs_psdev_open()) kmalloced 'ldu': 8 at f5bc6620 (tot 7258558). Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(module.c:171:libcfs_psdev_open()) Process leaving (rc=0 : 0 : 0) Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(module.c:228:libcfs_ioctl()) Process entered Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(linux-module.c:49:libcfs_ioctl_getdata()) Process entered Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(linux-module.c:90:libcfs_ioctl_getdata()) Process leaving (rc=0 : 0 : 0) Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(api-ni.c:1223:LNetNIInit()) refs 1 Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(api-ni.c:1614:lnet_ping()) kmalloced 'info': 144 at f0b95880 (tot 7258702). Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(lib-lnet.h:251:lnet_eq_alloc()) kmalloced 'eq': 48 at efda1a00 (tot 7258750). Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(lib-eq.c:72:LNetEQAlloc()) kmalloced 'eq->eq_events': 240 at f0b95c80 (tot 7258990). Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(lib-lnet.h:279:lnet_md_alloc()) kmalloced 'md': 84 at ed16acc0 (tot 7259074). Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(lib-lnet.h:327:lnet_msg_alloc()) kmalloced 'msg': 268 at f205a400 (tot 7259342). Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(lib-move.c:2395:LNetGet()) LNetGet -> 12345-172.24.198....@o2ib Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(o2iblnd_cb.c:1531:kiblnd_send()) sending 0 bytes in 0 frags to 12345-172.24.198....@o2ib Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(o2iblnd.c:312:kiblnd_create_peer()) kmalloced 'peer': 56 at efda18c0 (tot 7259398). Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(o2iblnd_cb.c:1501:kiblnd_launch_tx()) peer[efda18c0] -> 172.24.198....@o2ib (1)++ Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(o2iblnd_cb.c:1380:kiblnd_connect_peer()) peer[efda18c0] -> 172.24.198....@o2ib (2)++ Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(o2iblnd_cb.c:1507:kiblnd_launch_tx()) peer[efda18c0] -> 172.24.198....@o2ib (3)-- Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(lib-eq.c:209:LNetEQPoll()) Process entered Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(lib-eq.c:146:lib_get_event()) Process entered Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(lib-eq.c:149:lib_get_event()) event: f0b95cf8, sequence: 1, eq->size: 2 Jan 23 17:23:39 p186 kernel: Lustre: 14294:0:(lib-eq.c:152:lib_get_event()) Process leaving (rc=0 : 0 : 0) Jan 23 17:23:39 p186 kernel: Lustre: 2782:0:(o2iblnd_cb.c:2682:kiblnd_cm_callback()) 172.24.198....@o2ib Addr resolved: 0 Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(lib-eq.c:146:lib_get_event()) Process entered Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(lib-eq.c:149:lib_get_event()) event: f0b95cf8, sequence: 1, eq->size: 2 Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(lib-eq.c:152:lib_get_event()) Process leaving (rc=0 : 0 : 0) Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(lib-eq.c:239:LNetEQPoll()) Process leaving (rc=0 : 0 : 0) Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(api-ni.c:1665:lnet_ping()) poll 0(-1 -1) Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(lib-md.c:69:lnet_md_unlink()) Queueing unlink of md ed16acc0 Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(lib-eq.c:209:LNetEQPoll()) Process entered Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(lib-eq.c:146:lib_get_event()) Process entered Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(lib-eq.c:149:lib_get_event()) event: f0b95cf8, sequence: 1, eq->size: 2 Jan 23 17:23:40 p186 kernel: Lustre: 14294:0:(lib-eq.c:152:lib_get_event()) Process leaving (rc=0 : 0 : 0) Jan 23 17:23:56 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=4294962944 : -4352 : ffffef00) Jan 23 17:23:56 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=4294966784 : -512 : fffffe00) Jan 23 17:23:56 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=2817 : 2817 : b01) Jan 23 17:23:56 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=2047 : 2047 : 7ff) Jan 23 17:23:56 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=4294740832 : -226464 : fffc8b60) Jan 23 17:23:56 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=4286216485 : -8750811 : ff7a7925) Jan 23 17:23:56 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=5821091 : 5821091 : 58d2a3) Jan 23 17:23:56 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=3356952 : 3356952 : 333918) Jan 23 17:23:56 p186 kernel: Lustre: 8276:0:(pinger.c:193:ptlrpc_pinger_main()) next ping in 25000 (8510847) Jan 23 17:24:21 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=4294962944 : -4352 : ffffef00) Jan 23 17:24:21 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=4294966784 : -512 : fffffe00) Jan 23 17:24:21 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=2817 : 2817 : b01) Jan 23 17:24:21 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=2047 : 2047 : 7ff) Jan 23 17:24:21 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=4294740832 : -226464 : fffc8b60) Jan 23 17:24:21 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=4286216485 : -8750811 : ff7a7925) Jan 23 17:24:21 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=5821091 : 5821091 : 58d2a3) Jan 23 17:24:21 p186 kernel: Lustre: 8276:0:(lvfs_lib.c:173:lprocfs_read_helper()) Process leaving (rc=3356952 : 3356952 : 333918) Jan 23 17:24:21 p186 kernel: Lustre: 8276:0:(pinger.c:193:ptlrpc_pinger_main()) next ping in 25000 (8535847) Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(o2iblnd_cb.c:2704:kiblnd_cm_callback()) 172.24.198....@o2ib: ROUTE ERROR -110 Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(o2iblnd.c:422:kiblnd_unlink_peer_locked()) peer[efda18c0] -> 172.24.198....@o2ib (2)-- Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(router.c:151:lnet_notify()) 172.24.198....@o2ib notifying 172.24.198....@o2ib: down Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(router.c:82:lnet_notify_locked()) Old news Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(o2iblnd_cb.c:2118:kiblnd_peer_connect_failed()) Deleting messages for 172.24.198....@o2ib: connection failed Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(lib-md.c:73:lnet_md_unlink()) Unlinking md ed16acc0 Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(lib-lnet.h:301:lnet_md_free()) kfreed 'md': 84 at ed16acc0 (tot 7259314). Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(lib-lnet.h:344:lnet_msg_free()) kfreed 'msg': 268 at f205a400 (tot 7259046). Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(o2iblnd_cb.c:2706:kiblnd_cm_callback()) peer[efda18c0] -> 172.24.198....@o2ib (1)-- Jan 23 17:24:29 p186 kernel: Lustre: 2794:0:(o2iblnd.c:357:kiblnd_destroy_peer()) kfreed 'peer': 56 at efda18c0 (tot 7258990). Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(lib-eq.c:146:lib_get_event()) Process entered Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(lib-eq.c:149:lib_get_event()) event: f0b95cf8, sequence: 1, eq->size: 2 Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(lib-eq.c:170:lib_get_event()) Process leaving (rc=1 : 1 : 1) Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(lib-eq.c:232:LNetEQPoll()) Process leaving (rc=1 : 1 : 1) Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(api-ni.c:1665:lnet_ping()) poll 1(4 -113) unlinked Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(lib-lnet.h:259:lnet_eq_free()) kfreed 'eq': 48 at efda1a00 (tot 7258942). Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(lib-eq.c:135:LNetEQFree()) kfreed 'events': 240 at f0b95c80 (tot 7258702). Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(api-ni.c:1772:lnet_ping()) kfreed 'info': 144 at f0b95880 (tot 7258558). Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(module.c:336:libcfs_ioctl()) Process leaving (rc=4294967291 : -5 : fffffffb) Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(module.c:178:libcfs_psdev_release()) Process entered Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(module.c:183:libcfs_psdev_release()) kfreed 'ldu': 8 at f5bc6620 (tot 7258550). Jan 23 17:24:29 p186 kernel: Lustre: 14294:0:(module.c:187:libcfs_psdev_release()) Process leaving (rc=0 : 0 : 0) ~subbu On Fri, Jan 16, 2009 at 3:38 PM, subbu kl <subb...@gmail.com> wrote: > Liang, > > Right; you reproduced the exact problem. But as you can see in my previous > mail I think I have solved that problem by mannually assiging IP to ib0 > (check this line # ifconfig ib0 172.24.198.111 and *"Added LNI" lines *) > > we are back to sqare one now I guess ! LNET is up with mannually assigned > IPs. normal ping succeds between machines but not lctl ping. > > so my current problem is this : > # lctl ping 172.24.198....@o2ib > failed to ping 172.24.198....@o2ib: Input/output error > > /var/log/messages: > > Jan 16 10:24:14 p128 kernel: Lustre: 2750:0:(o2iblnd_cb.c:2687: > kiblnd_cm_callback()) 172.24.198....@o2ib: ROUTE ERROR -22 > Jan 16 10:24:14 p128 kernel: Lustre: > 2750:0:(o2iblnd_cb.c:2101:kiblnd_peer_connect_failed()) Deleting messages > for 172.24.198....@o2ib: connection failed > > how can I get rid of this connection problem? > > ~subbu > > > > On Fri, Jan 16, 2009 at 2:11 PM, Liang Zhen <zhen.li...@sun.com> wrote: > >> Subbu, >> >> We don't have any tip for setup IPoIB, looks like linux can't find the >> ifaddr of ib0 on MDS(-99 is EADDRNOTAVAIL), so I think it's because you >> didn't assign any address to ib0 (or failed to assign address to ib0) before >> loading o2iblnd in the first try. >> I can reproduce exactly same error by: >> 1. modprobe ib_ipoib >> 2. ifconfig ib0 up // without assign any address >> 3. modprobe ko2iblnd >> 4. lctl network up >> >> Regards >> Liang >> >> subbu kl: >> >>> Liang, >>> after executing following echo : >>> echo +neterror > /proc/sys/lnet/printk >>> >>> now lctlt ping shows the following error >>> >>> # lctl ping 172.24.198....@o2ib >>> failed to ping 172.24.198....@o2ib: Input/output error >>> >>> Jan 16 10:24:14 p128 kernel: Lustre: >>> 2750:0:(o2iblnd_cb.c:2687:kiblnd_cm_callback()) 172.24.198....@o2ib: >>> ROUTE ERROR -22 >>> Jan 16 10:24:14 p128 kernel: Lustre: >>> 2750:0:(o2iblnd_cb.c:2101:kiblnd_peer_connect_failed()) Deleting messages >>> for 172.24.198....@o2ib: connection failed >>> >>> Looks like some problem with "IB connection manager" ! >>> >>> 1. do we have any help docs to setup IPoIB and Lustre, lustre operation >>> manual has very minimal info about this . I think I am missing some IPoIB >>> setup part here. >>> 2. or is it mannual assignment of IP addresses to "ib0" is creating some >>> problem >>> >>> >>> *Some more supporting info : >>> *subnet manager of following version is also running : OpenSM 3.1.8 >>> >>> Initially I got this error for MDS mount >>> >>> Jan 16 09:45:20 p128 kernel: LustreError: >>> 4991:0:(linux-tcpip.c:124:libcfs_ipif_query()) Can't get IP address for >>> interface ib0 >>> Jan 16 09:45:20 p128 kernel: LustreError: >>> 4991:0:(o2iblnd.c:1563:kiblnd_startup()) Can't query IPoIB interface ib0: >>> -99 >>> Jan 16 09:45:21 p128 kernel: LustreError: 105-4: Error -100 starting up >>> LNI o2ib >>> Jan 16 09:45:21 p128 kernel: LustreError: >>> 4991:0:(events.c:707:ptlrpc_init_portals()) network initialisation failed >>> Jan 16 09:45:21 p128 modprobe: WARNING: Error inserting ptlrpc >>> (/lib/modules/2.6.18-53.1.14.el5_lustre.1.6.5.1smp/kernel/fs/lustre/ptlrpc.ko): >>> Input/output error >>> Jan 16 09:45:21 p128 modprobe: WARNING: Error inserting osc >>> (/lib/modules/2.6.18-53.1.14.el5_lustre.1.6.5.1smp/kernel/fs/lustre/osc.ko): >>> Unknown symbol in module, or unknown parameter (see dmesg) >>> Jan 16 09:45:21 p128 kernel: osc: Unknown symbol ldlm_prep_enqueue_req >>> Jan 16 09:45:21 p128 kernel: osc: Unknown symbol ldlm_resource_get >>> Jan 16 09:45:21 p128 kernel: osc: Unknown symbol >>> ptlrpc_lprocfs_register_obd >>> . >>> . >>> . >>> >>> then I mannually set the IP address for ib0 as folows : >>> # ifconfig ib0 172.24.198.111 >>> >>> [r...@p186 ~]# ifconfig ib0 >>> ib0 Link encap:InfiniBand HWaddr >>> 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 >>> inet addr:172.24.198.112 Bcast:172.24.255.255 Mask:255.255.0.0 >>> UP BROADCAST MULTICAST MTU:65520 Metric:1 >>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:256 >>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) >>> >>> then it mounted sucessfully >>> >>> * Jan 16 09:47:09 p128 kernel: Lustre: Added LNI 172.24.198....@o2ib[8/64] >>> Jan 16 09:47:09 p128 kernel: Lustre: MGS MGS started* >>> Jan 16 09:47:09 p128 kernel: Lustre: Setting parameter >>> lustre-MDT0000.mdt.group_upcall in log lustre-MDT0000 >>> Jan 16 09:47:09 p128 kernel: Lustre: Enabling user_xattr >>> Jan 16 09:47:09 p128 kernel: Lustre: lustre-MDT0000: new disk, >>> initializing >>> Jan 16 09:47:09 p128 kernel: Lustre: MDT lustre-MDT0000 now serving dev >>> (lustre-MDT0000/64db1fc7-03ba-9803-4d20-ab0d2aa66116) with recovery enabled >>> Jan 16 09:47:09 p128 kernel: Lustre: >>> 5274:0:(lproc_mds.c:262:lprocfs_wr_group_upcall()) lustre-MDT0000: group >>> upcall set to /usr/sbin/l_getgroups >>> Jan 16 09:47:09 p128 kernel: Lustre: lustre-MDT0000.mdt: set parameter >>> group_upcall=/usr/sbin/l_getgroups >>> Jan 16 09:47:09 p128 kernel: Lustre: Server lustre-MDT0000 on device >>> /dev/loop0 has started >>> . >>> . >>> . >>> >>> >>> ~subbu >>> >>> >>> On Thu, Jan 15, 2009 at 8:37 PM, Liang Zhen <zhen.li...@sun.com <mailto: >>> zhen.li...@sun.com>> wrote: >>> >>> Subbu, >>> >>> I'd suggest: >>> 1) make sure ko2iblnd has been brought up (please check if there >>> is any error message when startup ko2iblnd) >>> 2) echo +neterror > /proc/sys/lnet/printk, then try with lctl >>> ping, if it still can't work please post error messages >>> >>> Regards >>> Liang >>> >>> subbu kl: >>> >>> Problem is similer to >>> >>> http://lists.lustre.org/pipermail/lustre-discuss/2008-May/007498.html >>> But by looking at the thread could not really get the solution >>> for the problem. >>> >>> I have two RHEL5 Linux servers installed with following packages - >>> >>> kernel-lustre-smp-2.6.18-53.1.14.el5_lustre.1.6.5.1 >>> kernel-ib-1.3-2.6.18_53.1.14.el5_lustre.1.6.5.1smp >>> lustre-ldiskfs-3.0.4-2.6.18_53.1.14.el5_lustre.1.6.5.1smp >>> lustre-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp >>> lustre-modules-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp >>> e2fsprogs-1.40.7.sun3-0redhat >>> >>> >>> machine 1: with ib0 IP address : 172.24.198.111 >>> machine 2: with ib0 IP address : 172.24.198.112 >>> >>> /etc/modprobe.conf contains >>> options lnet networks=o2ib >>> >>> TCP networking worked fine and now I am trying with Infiniband >>> network finding it difficult in communicating with IB nodes >>> mounting effort throghs me the following error >>> >>> [r...@p186 ~]# mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1 >>> mount.lustre: mount /dev/loop0 at /mnt/ost1 failed: >>> Input/output error >>> Is the MGS running? >>> >>> /var/log/messages : >>> Jan 15 16:55:25 p186 kernel: kjournald starting. Commit >>> interval 5 seconds >>> Jan 15 16:55:25 p186 kernel: LDISKFS FS on loop0, internal journal >>> Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mounted filesystem >>> with ordered data mode. >>> Jan 15 16:55:25 p186 kernel: kjournald starting. Commit >>> interval 5 seconds >>> Jan 15 16:55:25 p186 kernel: LDISKFS FS on loop0, internal journal >>> Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mounted filesystem >>> with ordered data mode. >>> Jan 15 16:55:25 p186 kernel: LDISKFS-fs: file extents enabled >>> Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mballoc enabled >>> Jan 15 16:55:30 p186 kernel: Lustre: Request x7 sent from >>> mgc172.24.198....@o2ib to NID 172.24.198....@o2ib 5s ago has >>> timed out (limit 5s). >>> Jan 15 16:55:30 p186 kernel: LustreError: >>> 7193:0:(obd_mount.c:1062:server_start_targets()) Required >>> registration failed for lustre-OSTffff: -5 >>> Jan 15 16:55:30 p186 kernel: LustreError: 15f-b: Communication >>> error with the MGS. Is the MGS running? >>> Jan 15 16:55:30 p186 kernel: LustreError: >>> 7193:0:(obd_mount.c:1597:server_fill_super()) Unable to start >>> targets: -5 >>> Jan 15 16:55:30 p186 kernel: LustreError: >>> 7193:0:(obd_mount.c:1382:server_put_super()) no obd lustre-OSTffff >>> Jan 15 16:55:30 p186 kernel: LustreError: >>> 7193:0:(obd_mount.c:119:server_deregister_mount()) >>> lustre-OSTffff not registered >>> Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 blocks 0 >>> reqs (0 success) >>> Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 extents >>> scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost >>> Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 generated >>> and it took 0 >>> Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 >>> preallocated, 0 discarded >>> Jan 15 16:55:30 p186 kernel: Lustre: server umount >>> lustre-OSTffff complete >>> Jan 15 16:55:30 p186 kernel: LustreError: >>> 7193:0:(obd_mount.c:1951:lustre_fill_super()) Unable to mount >>> (-5) >>> >>> All pinging efforts also failed to the IB NIDS local/remote >>> can ping the ip address : >>> [r...@p186 ~]# ping 172.24.198.112 >>> PING 172.24.198.112 (172.24.198.112) 56(84) bytes of data. >>> 64 bytes from 172.24.198.112 <http://172.24.198.112>: >>> icmp_seq=1 ttl=64 time=0.052 ms >>> 64 bytes from 172.24.198.112 <http://172.24.198.112>: >>> icmp_seq=2 ttl=64 time=0.024 ms >>> >>> >>> --- 172.24.198.112 ping statistics --- >>> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms >>> rtt min/avg/max/mdev = 0.024/0.038/0.052/0.014 ms >>> [r...@p186 ~]# ping 172.24.198.111 >>> PING 172.24.198.111 (172.24.198.111) 56(84) bytes of data. >>> 64 bytes from 172.24.198.111 <http://172.24.198.111>: >>> icmp_seq=1 ttl=64 time=2.16 ms >>> 64 bytes from 172.24.198.111 <http://172.24.198.111>: >>> icmp_seq=2 ttl=64 time=0.296 ms >>> >>> >>> --- 172.24.198.111 ping statistics --- >>> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms >>> rtt min/avg/max/mdev = 0.296/1.231/2.166/0.935 ms >>> >>> but cant ping the NIDS : >>> [r...@p186 ~]# lctl ping 172.24.198....@o2ib >>> failed to ping 172.24.198....@o2ib: Input/output error >>> [r...@p186 ~]# lctl ping 172.24.198....@o2ib >>> failed to ping 172.24.198....@o2ib: Input/output error >>> >>> Any idea why lnet cant ping NIDS ? >>> >>> some more configurations: >>> [r...@p186 ~]# ibstat >>> CA 'mthca0' >>> CA type: MT23108 >>> Number of ports: 2 >>> Firmware version: 3.5.0 >>> Hardware version: a1 >>> Node GUID: 0x0002c9020021550c >>> >>> Machines are connected via IB switch. >>> >>> Looking forward for help. >>> >>> ~subbu >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss@lists.lustre.org >>> <mailto:Lustre-discuss@lists.lustre.org> >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >>> >>> >>> >>> -- >>> . . . s u b b u >>> "You've got to be original, because if you're like someone else, what do >>> they need you for?" >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss@lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> >> > > > -- > . . . s u b b u > "You've got to be original, because if you're like someone else, what do > they need you for?" > -- . . . s u b b u "You've got to be original, because if you're like someone else, what do they need you for?"
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss