[Lustre-discuss] Lustre Client 2.4.1 on Ubuntu Precise 12.04 with Mellanox OFED gen2
Hi, Somebody has ever successfully compiled Lustre Client 2.4.1 on Ubuntu Precise 12.04 with Mellanox OFED 2.0.3? I am stucked with this error: mel-bc1e41-be14:/usr/src/lustre-2.4.1# ./configure --with-o2ib=/usr/src/mlnx-ofed-kernel-2.0 --disable-server checking build system type... x86_64-unknown-linux-gnu . . . checking whether to enable OpenIB gen2 support... no configure: error: can't compile with OpenIB gen2 headers under /usr/src/mlnx-ofed-kernel-2.0 I tried a couple of patches/hacks found on Google but without success. Thanks. -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Need Help
Hi, I am getting that occasionnally and try to remount another time, which works. I am interested in finding out what's happenning too. Thanks. On 01/07/12 07:19, Ashok nulguda wrote: Dear All, We have Lustre 1.8.4 installed with 2 MDS servers and 2 OSS servers with 17 OSTes and 1 MDT with ha configured on both my MDS and OSS. problem:- Some of my OSTes are not mounting on my OSS servers. When i try to maunully mount it through errors " failed: Transport endpoint is not connected" commnd :-mount -t lustre /dev/mapper/.. /OST1 " failed: Transport endpoint is not connected" however, when we login and check MDS server for lustre ost status we found cat /proc/fs/lustre/mds/lustre-MDT/recovery_status It shows completed And also cat /proc/fs/lustre/devices All my mdt and ost are showing up status. Can anyone help us it debuging. Thanks and Regards Ashok -- *Ashok Nulguda * *TATA ELXSI LTD* *Mb : +91 9689945767 Mb : +91 9637095767 Land line : 2702044871 * *Email :ash...@tataelxsi.co.in <mailto:tshrik...@tataelxsi.co.in>* ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Téléphone | Telephone 514-421-5303 Télécopieur | Facsimile 514-421-7231 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Heartbeat problem
Thanks Franks, Works just great! Greetings! On 12/23/11 13:18, Frank Heckes wrote: > Hi, > > we had had the same problem. We 'fixed' it by increasing the start > parameter in Linux-HA > script /usr/lib/ocf/resource.d/heartbeat/Filesystem > > ... > > ... > > If you use pacemaker or RH cluster suite (although your config dir looks > like linux-ha) there's probably a similar parameter. > > Cheers > > -Frank > > On Thu, 2011-12-22 at 16:38 +0100, Patrice Hamelin wrote: >> Hi, >> >> I have a heartbeat problem while trying automatic failover. Manual >> failover works great, unmounting a partitition from an OSS and >> remounting it on another one makes the clients recover. It all starts >> with this error: >> >> Filesystem[7650]: 2011/12/22_14:36:05 ERROR: Couldn't mount >> filesystem /dev/mpath/colosse4-lun60-sata on /mnt/data/clun60 >> Filesystem[7639]: 2011/12/22_14:36:05 ERROR: Generic error >> >> As a result, the failover OSS is the wrong one and the clients stays >> in this state forever: >> >> sata-OST_UUID : Resource temporarily unavailable >> >> Here is my heartbeat config: >> >> [root@ib3-st02 ~]# cat /etc/ha.d/ha.cf >> # log file settings >> # write debug output to /var/log/ha-debug >> debugfile /var/log/ha-debug >> # write log messages to /var/log/ha-log >> logfile /var/log/ha-log >> # use syslog to write to logfiles >> logfacility local0 >> # set some time-outs. these values are only recommendations, which >> # depend e.g. on the OSS load >> # send keep-alive packages every 2 seconds >> keepalive 2 >> # wait 90 seconds before declaring a node dead >> deadtime 90 >> # write a warning to the logfile after 30 seconds without an answer >> # from the failover node >> warntime 30 >> # wait for 120 seconds before declaring a node dead after heartbeat >> # is brought up >> initdead 120 >> # define communication channels >> # use port 12345 to communicate with fail-over node >> udpport 12345 >> # use network interfaces eth0 and ib0 to detect a failed node >> bcast eth0 bond0 >> # Use manual failback >> auto_failback off >> # node names in this failover-pair. These names must match the >> # output of `hostname` >> node ib3-st01 >> node ib3-st02 >> node ib3-st03 >> node ib3-st04 >> >> [root@ib3-st02 ~]# cat /etc/ha.d/haresources >> ib3-st01 Filesystem::/dev/emcssd-1/mdt-sata::/mnt/mdt-colosse::lustre >> ib3-st01 >> Filesystem::/dev/mpath/colosse4-lun53-sata::/mnt/data/clun53::lustre >> ib3-st02 >> Filesystem::/dev/mpath/colosse4-lun54-sata::/mnt/data/clun54::lustre >> ib3-st03 >> Filesystem::/dev/mpath/colosse4-lun55-sata::/mnt/data/clun55::lustre >> ib3-st04 >> Filesystem::/dev/mpath/colosse4-lun56-sata::/mnt/data/clun56::lustre >> ib3-st01 >> Filesystem::/dev/mpath/colosse4-lun57-sata::/mnt/data/clun57::lustre >> ib3-st02 >> Filesystem::/dev/mpath/colosse4-lun58-sata::/mnt/data/clun58::lustre >> ib3-st03 >> Filesystem::/dev/mpath/colosse4-lun59-sata::/mnt/data/clun59::lustre >> ib3-st04 >> Filesystem::/dev/mpath/colosse4-lun60-sata::/mnt/data/clun60::lustre >> >> >> It is all the same on all OSS's. >> >> Does anybody ever encounter that problem? >> Thanks for help. >> >> >> >> > > > > > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, > Prof. Dr. Sebastian M. Schmidt > > > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Téléphone | Telephone 514-421-5303 Télécopieur | Facsimile 514-421-7231 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Errors on mounting clients
Hi, Me again! :-) I am getting errors before being able to mount clients. On o2ib networks, I tried 3 or 4 clients which behave the same: error on first mount and second try mounts. ib3-bc3e41-be02:~# mount -t lustre ib3-st01s@o2ib3:ib3-st02s@o2ib3:/sata /mnt/sata mount.lustre: mount ib3-st01s@o2ib3:ib3-st02s@o2ib3:/sata at /mnt/sata failed: Cannot send after transport endpoint shutdown ib3-bc3e41-be02:~# mount -t lustre ib3-st01s@o2ib3:ib3-st02s@o2ib3:/sata /mnt/sata Generating a network error in the logs: Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370607] Lustre: 2671:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388634652055195 sent from MGC10.10.135.115@o2ib3 to NID 10.10.135.115@o2ib3 0s ago has failed due to network error (5s prior to deadline). Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370611] req@8804917c4c00 x1388634652055195/t0 o250->MGS@MGC10.10.135.115@o2ib3_0:26/25 lens 368/584 e 0 to 1 dl 1324569206 ref 1 fl Rpc:N/0/0 rc 0/0 Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370617] Lustre: 2671:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 108 previous similar messages On my TCP clients, it is a little bit different: ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata /mnt/sata mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata failed: Cannot send after transport endpoint shutdown ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata /mnt/sata mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata failed: File exists ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata /mnt/sata mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata failed: File exists ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata /mnt/sata mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata failed: File exists ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata /mnt/sata mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata failed: File exists ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata /mnt/sata mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata failed: File exists ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata /mnt/sata mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata failed: File exists . . . Then finally mounts after several tries. Log files shows a network error once again: Dec 22 15:59:43 ib4-bc1f82-be13 kernel: [172481.077865] Lustre: MGC10.10.132.115@tcp: Reactivating import Dec 22 15:59:43 ib4-bc1f82-be13 kernel: [172481.087738] Lustre: 3057:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388731500687365 sent from sata-OST0004-osc-88096cd98000 to NID 10.10.132.111@tcp 0s ago has failed due to network error (5s prior to deadline). I know it is related to network, but my own network works just fine. What about lnet? How can I explain/eliminate that problem? Thanks! Greetings! -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Téléphone | Telephone 514-421-5303 Télécopieur | Facsimile 514-421-7231 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Heartbeat problem
Hi, I have a heartbeat problem while trying automatic failover. Manual failover works great, unmounting a partitition from an OSS and remounting it on another one makes the clients recover. It all starts with this error: Filesystem[7650]: 2011/12/22_14:36:05 ERROR: Couldn't mount filesystem /dev/mpath/colosse4-lun60-sata on /mnt/data/clun60 Filesystem[7639]: 2011/12/22_14:36:05 ERROR: Generic error As a result, the failover OSS is the wrong one and the clients stays in this state forever: sata-OST_UUID : Resource temporarily unavailable Here is my heartbeat config: [root@ib3-st02 ~]# cat /etc/ha.d/ha.cf # log file settings # write debug output to /var/log/ha-debug debugfile /var/log/ha-debug # write log messages to /var/log/ha-log logfile /var/log/ha-log # use syslog to write to logfiles logfacility local0 # set some time-outs. these values are only recommendations, which # depend e.g. on the OSS load # send keep-alive packages every 2 seconds keepalive 2 # wait 90 seconds before declaring a node dead deadtime 90 # write a warning to the logfile after 30 seconds without an answer # from the failover node warntime 30 # wait for 120 seconds before declaring a node dead after heartbeat # is brought up initdead 120 # define communication channels # use port 12345 to communicate with fail-over node udpport 12345 # use network interfaces eth0 and ib0 to detect a failed node bcast eth0 bond0 # Use manual failback auto_failback off # node names in this failover-pair. These names must match the # output of `hostname` node ib3-st01 node ib3-st02 node ib3-st03 node ib3-st04 [root@ib3-st02 ~]# cat /etc/ha.d/haresources ib3-st01 Filesystem::/dev/emcssd-1/mdt-sata::/mnt/mdt-colosse::lustre ib3-st01 Filesystem::/dev/mpath/colosse4-lun53-sata::/mnt/data/clun53::lustre ib3-st02 Filesystem::/dev/mpath/colosse4-lun54-sata::/mnt/data/clun54::lustre ib3-st03 Filesystem::/dev/mpath/colosse4-lun55-sata::/mnt/data/clun55::lustre ib3-st04 Filesystem::/dev/mpath/colosse4-lun56-sata::/mnt/data/clun56::lustre ib3-st01 Filesystem::/dev/mpath/colosse4-lun57-sata::/mnt/data/clun57::lustre ib3-st02 Filesystem::/dev/mpath/colosse4-lun58-sata::/mnt/data/clun58::lustre ib3-st03 Filesystem::/dev/mpath/colosse4-lun59-sata::/mnt/data/clun59::lustre ib3-st04 Filesystem::/dev/mpath/colosse4-lun60-sata::/mnt/data/clun60::lustre It is all the same on all OSS's. Does anybody ever encounter that problem? Thanks for help. -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Téléphone | Telephone 514-421-5303 Télécopieur | Facsimile 514-421-7231 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] failover on multihomed clusters
Hi, If you refer to my previous message, you will see that I have two multihomed clusters, each having Lustre servers and clients. I have clients mounting lustre partitions from o2ib and tcp. Now I am inplementing failover, did a try this morning without success, so RTFM. I read: Note -- If you have an MGS or MDT configured for failover, perform these steps: 1. On the OST, list the NIDs of all MGS nodes at mkfs time. OST# mkfs.lustre --fsname sunfs --ost --mgsnode=10.0.0.1 --mgsnode=10.0.0.2 /dev/{device} 2. On the client, mount the file system. client# mount -t lustre 10.0.0.1:10.0.0.2:/sunfs /cfs/client/ So I extended the logic from : mkfs.lustre --mgs --mdt --fsname=sata --failnode=ib3-st02s@o2ib3 <mailto:--failnode%3Dib4-st02s@o2ib4> --reformat /dev/mpath/emcssd-1 mkfs.lustre --fsname sata --reformat --ost --mgsnode=ib3-st01s@o2ib3 --mgsnode=ib3-st01e@tcp --failnode=ib3-st02s@o2ib3 <mailto:--failnode%3Dib4-st02s@o2ib4> /dev/mpath/colosse4-lun54-sata to: mkfs.lustre --mgs --mdt --fsname=sata --failnode=ib3-st02s@o2ib3,ib3-st02e@tcp --reformat /dev/mpath/emcssd-1 mkfs.lustre --fsname sata --reformat --ost --mgsnode=ib3-st01s@o2ib3,ib3-st01e@tcp --mgsnode=ib3-st02s@o2ib3,ib3-st02e@tcp --failnode=ib3-st02s@o2ib3,ib3-st02e@tcp /dev/mpath/colosse4-lun53-sata And so on for other disks. Partitions mounts great on the MDS/MGS/OSS server, but on the OSS only, I have: [root@ib3-st03 ~]# mount -t lustre /dev/mpath/colosse4-lun55-sata /mnt/data/clun55 mount.lustre: mount /dev/mpath/colosse4-lun55-sata at /mnt/data/clun55 failed: Interrupted system call messages file contains: Dec 21 15:18:52 ib3-st03 kernel: Lustre: 9464:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388814699331655 sent from MGC10.10.135.115@o2ib3 to NID 10.10.135.116@o2ib3 5s ago has timed out (5s prior to deadline). Dec 21 15:18:52 ib3-st03 kernel: req@810116fff800 x1388814699331655/t0 o250->MGS@MGC10.10.135.115@o2ib3_1:26/25 lens 368/584 e 0 to 1 dl 1324480732 ref 1 fl Rpc:N/0/0 rc 0/0 Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:1112:server_start_targets()) Required registration failed for sata-OST: -4 Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:1670:server_fill_super()) Unable to start targets: -4 Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:1453:server_put_super()) no obd sata-OST Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:147:server_deregister_mount()) sata-OST not registered Dec 21 15:18:52 ib3-st03 kernel: Lustre: server umount sata-OST complete Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-4) so my question is? What would ne the correct syntax to make sure I have a failover on the o2ib clients as well as the tcp clients? Thanks -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Téléphone | Telephone 514-421-5303 Télécopieur | Facsimile 514-421-7231 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] two multi-homed cluster
Hi, I have two Infiniband clusters, each in a separate location with a solid ethernet connectivity between each of them. Say they are named cluster A and cluster B. All members of each clusters have both IB and eth networks available to them, and the IB network is not routed between cluster A and B, but ethernet is. On each clusters, I have 4 OSS's serving FC disks. Clients on cluster A mounts Lustre disk from their local cluster, and the same goes on for for cluster B, both on Infiniband NIDs. What I would like to achieve is client from cluster A to mount disks from OSS's on cluster B on the ethernet connection. The same goes on for clients in cluster B to mount disks from OSS's on cluster A. From my readings in the luster 1.8.7 manual, I got: 7.1.1 Modprobe.conf Options under modprobe.conf are used to specify the networks available to a node. You have the choice of two different options -- the networks option, which explicitly lists the networks available and the ip2nets option, which provides a list-matching lookup. Only one option can be used at any one time. The order of LNET lines in modprobe.conf is important when configuring multi-homed servers. *If a server node can be reached using more than one network, the first network specified in modprobe.conf will be used.* Is the last sentence means that I cannot do that? Thanks. -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] two multi-homed cluster
OK! Found the solution (came from a Luster user). So simple!... Quote: --- I think the possible solution to your problem lies in differentiating the two different IB networks - by changing the lustre lnet device names. This means that each separate cluster would have different non-default "o2ib" naming convention in modprobe.conf. The IB3 lustre servers might call it: options lnet networks="o2ib3(bond0),tcp(eth0)" and the IB4 lustre servers might call it: options lnet networks="o2ib4(bond0),tcp(eth0)" --- That solution works perfectly. Thanks to repliers! Season's Greetings all! On 12/19/11 12:57, Patrice Hamelin wrote: Cliff, Maybe our configuration is a bit special. We are running two Infiniband partitions, one for storage and the other for TCP over IB. Both clusters are named IB3 and IB4. I have 4 OSS on clustre IB3 which are configured like: bond0 Link encap:InfiniBand HWaddr 80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.10.135.115 Bcast:10.10.135.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:e:8bc6/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:336 (336.0 b) TX bytes:0 (0.0 b) eth0 Link encap:Ethernet HWaddr E4:1F:13:60:93:C0 inet addr:10.10.132.115 Bcast:10.10.132.255 Mask:255.255.255.0 inet6 addr: fe80::e61f:13ff:fe60:93c0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:85 errors:0 dropped:0 overruns:0 frame:0 TX packets:91 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:10707 (10.4 KiB) TX bytes:10607 (10.3 KiB) Interrupt:169 Memory:9200-92012800 ib0.8001 Link encap:InfiniBand HWaddr 80:00:00:4A:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 UP BROADCAST RUNNING SLAVE MULTICAST MTU:65520 Metric:1 RX packets:3 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:168 (168.0 b) TX bytes:0 (0.0 b) ib1.8001 Link encap:InfiniBand HWaddr 80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 UP BROADCAST RUNNING SLAVE MULTICAST MTU:65520 Metric:1 RX packets:3 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:168 (168.0 b) TX bytes:0 (0.0 b) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:560 (560.0 b) TX bytes:560 (560.0 b) [root@ib3-st01 ~]# cat /etc/modprobe.conf alias eth0 bnx2 alias eth1 bnx2 alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptsas alias scsi_hostadapter2 ata_piix alias scsi_hostadapter3 qla2xxx alias usb0 cdc_ether alias bond0 bonding options bond0 miimon=100 mode=1 options lnet networks="o2ib(bond0),tcp(eth0)" options ost oss_num_threads=24 I formatted the MGS/MDT like: mkfs.lustre --mgs --mdt --fsname=sata --reformat /dev/mpath/emcssd-1 And the 8 OST's like: mkfs.lustre --fsname sata --reformat --ost --mgsnode=10.10.135.115@o2ib --mgsnode=10.10.132.115@tcp /dev/mpath/colosse4-lun53-sata [root@ib3-st01 ~]# cat /etc/ha.d/haresources ib3-st01 Filesystem::/dev/mpath/emcssd-1::/mnt/mdt-colosse::lustre ib3-st01 Filesystem::/dev/mpath/colosse4-lun53-sata::/mnt/data/clun53::lustre ib3-st02 Filesystem::/dev/mpath/colosse4-lun54-sata::/mnt/data/clun54::lustre ib3-st03 Filesystem::/dev/mpath/colosse4-lun55-sata::/mnt/data/clun55::lustre ib3-st04 Filesystem::/dev/mpath/colosse4-lun56-sata::/mnt/data/clun56::lustre ib3-st01 Filesystem::/dev/mpath/colosse4-lun57-sata::/mnt/data/clun57::lustre ib3-st02 Filesystem::/dev/mpath/colosse4-lun58-sata::/mnt/data/clun58::lustre ib3-st03 Filesystem::/dev/mpath/colosse4-lun59-sata::/mnt/data/clun59::lustre ib3-st04 Filesystem::/dev/mpath/colosse4-lun60-sata::/mnt/data/clun60::lustre [root@ib3-st01 ~]# lctl list_nids 10.10.135.115@o2ib 10.10.132.115@tcp service heartbeat start Client on cluster IB3 ib3-bc3e41-be01:~# ifconfig ib0.8001 Link encap:UNSPEC HWaddr 80-00-00-51-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.10.135.74 Bcast:10.10.135.255 Mask:255.255.255.0 inet6 addr: fe80::224:e890:97fe:fc91/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:5580 errors:0 dropped:0 overruns
Re: [Lustre-discuss] two multi-homed cluster
ould have options lnet networks="tcp0(eth0),o2ib0(ib0)" The IB clients will mount using a @o2ib0 NID, and the ethernet clients will mount using @tcp0 NIDs. Since you are explicitly specifying the network, the hop rule doesn't apply. cliffw On Fri, Dec 16, 2011 at 9:49 AM, Patrice Hamelin mailto:patrice.hame...@ec.gc.ca>> wrote: Hi, I have two Infiniband clusters, each in a separate location with a solid ethernet connectivity between each of them. Say they are named cluster A and cluster B. All members of each clusters have both IB and eth networks available to them, and the IB network is not routed between cluster A and B, but ethernet is. On each clusters, I have 4 OSS's serving FC disks. Clients on cluster A mounts Lustre disk from their local cluster, and the same goes on for for cluster B, both on Infiniband NIDs. What I would like to achieve is client from cluster A to mount disks from OSS's on cluster B on the ethernet connection. The same goes on for clients in cluster B to mount disks from OSS's on cluster A. From my readings in the luster 1.8.7 manual, I got: 7.1.1 Modprobe.conf Options under modprobe.conf are used to specify the networks available to a node. You have the choice of two different options – the networks option, which explicitly lists the networks available and the ip2nets option, which provides a list-matching lookup. Only one option can be used at any one time. The order of LNET lines in modprobe.conf is important when configuring multi-homed servers. *If a server node can be reached using more than one network, the first network specified in modprobe.conf will be used.* Is the last sentence means that I cannot do that? Thanks. -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org <mailto:Lustre-discuss@lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss -- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com <http://www.whamcloud.com> -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Téléphone | Telephone 514-421-5303 Télécopieur | Facsimile 514-421-7231 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] two multi-homed cluster
Hi, I have two Infiniband clusters, each in a separate location with a solid ethernet connectivity between each of them. Say they are named cluster A and cluster B. All members of each clusters have both IB and eth networks available to them, and the IB network is not routed between cluster A and B, but ethernet is. On each clusters, I have 4 OSS's serving FC disks. Clients on cluster A mounts Lustre disk from their local cluster, and the same goes on for for cluster B, both on Infiniband NIDs. What I would like to achieve is client from cluster A to mount disks from OSS's on cluster B on the ethernet connection. The same goes on for clients in cluster B to mount disks from OSS's on cluster A. From my readings in the luster 1.8.7 manual, I got: 7.1.1 Modprobe.conf Options under modprobe.conf are used to specify the networks available to a node. You have the choice of two different options -- the networks option, which explicitly lists the networks available and the ip2nets option, which provides a list-matching lookup. Only one option can be used at any one time. The order of LNET lines in modprobe.conf is important when configuring multi-homed servers. *If a server node can be reached using more than one network, the first network specified in modprobe.conf will be used.* Is the last sentence means that I cannot do that? Thanks. -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss