[Lustre-discuss] Lustre Client 2.4.1 on Ubuntu Precise 12.04 with Mellanox OFED gen2

2014-01-10 Thread Patrice Hamelin
Hi,

   Somebody has ever successfully compiled Lustre Client 2.4.1 on Ubuntu 
Precise 12.04 with Mellanox OFED 2.0.3?  I am stucked with this error:

mel-bc1e41-be14:/usr/src/lustre-2.4.1# ./configure 
--with-o2ib=/usr/src/mlnx-ofed-kernel-2.0 --disable-server
checking build system type... x86_64-unknown-linux-gnu
.
.
.
checking whether to enable OpenIB gen2 support... no
configure: error: can't compile with OpenIB gen2 headers under 
/usr/src/mlnx-ofed-kernel-2.0

   I tried a couple of patches/hacks found on Google but without success.

Thanks.

-- 
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Gouvernement du Canada | Government of Canada

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Need Help

2012-01-09 Thread Patrice Hamelin

Hi,

  I am getting that occasionnally and try to remount another time, 
which works.  I am interested in finding out what's happenning too.


Thanks.

On 01/07/12 07:19, Ashok nulguda wrote:

Dear All,

We have Lustre 1.8.4 installed with 2 MDS servers and 2 OSS servers 
with 17 OSTes and 1 MDT with ha configured on both my MDS and OSS.

problem:-
Some of my OSTes are not mounting on my OSS servers.
When i try to maunully mount it  through errors " failed: Transport 
endpoint is not connected"

commnd :-mount -t lustre /dev/mapper/..   /OST1
" failed: Transport endpoint is not connected"

however, when we login and check MDS server for lustre ost status we found
cat /proc/fs/lustre/mds/lustre-MDT/recovery_status
It shows completed
And also
cat /proc/fs/lustre/devices
All my mdt and ost are showing up status.

Can anyone help us it debuging.


Thanks and Regards
Ashok

--
*Ashok Nulguda
*
*TATA ELXSI LTD*
*Mb : +91 9689945767
Mb : +91 9637095767
Land line : 2702044871
*
*Email :ash...@tataelxsi.co.in <mailto:tshrik...@tataelxsi.co.in>*


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Heartbeat problem

2011-12-23 Thread Patrice Hamelin
Thanks Franks,

   Works just great!

Greetings!

On 12/23/11 13:18, Frank Heckes wrote:
> Hi,
>
> we had had the same problem. We 'fixed' it by increasing the start
> parameter in Linux-HA
> script /usr/lib/ocf/resource.d/heartbeat/Filesystem
>
>  ...
>  
>  ...
>
> If you use pacemaker or RH cluster suite (although your config dir looks
> like linux-ha) there's probably a similar parameter.
>
> Cheers
>
> -Frank
>
> On Thu, 2011-12-22 at 16:38 +0100, Patrice Hamelin wrote:
>> Hi,
>>
>> I have a heartbeat problem while trying automatic failover.  Manual
>> failover works great, unmounting a  partitition from an OSS and
>> remounting it on another one makes the clients recover.  It all starts
>> with this error:
>>
>> Filesystem[7650]:   2011/12/22_14:36:05 ERROR: Couldn't mount
>> filesystem /dev/mpath/colosse4-lun60-sata on /mnt/data/clun60
>> Filesystem[7639]:   2011/12/22_14:36:05 ERROR:  Generic error
>>
>> As a result, the failover OSS is the wrong one and the clients stays
>> in this state forever:
>>
>> sata-OST_UUID   : Resource temporarily unavailable
>>
>> Here is my heartbeat config:
>>
>> [root@ib3-st02 ~]# cat /etc/ha.d/ha.cf
>> # log file settings
>> # write debug output to /var/log/ha-debug
>> debugfile /var/log/ha-debug
>> # write log messages to /var/log/ha-log
>> logfile /var/log/ha-log
>> # use syslog to write to logfiles
>> logfacility local0
>> # set some time-outs. these values are only recommendations, which
>> # depend e.g. on the OSS load
>> # send keep-alive packages every 2 seconds
>> keepalive 2
>> # wait 90 seconds before declaring a node dead
>> deadtime 90
>> # write a warning to the logfile after 30 seconds without an answer
>> # from the failover node
>> warntime 30
>> # wait for 120 seconds before declaring a node dead after heartbeat
>> # is brought up
>> initdead 120
>> # define communication channels
>> # use port 12345 to communicate with fail-over node
>> udpport 12345
>> # use network interfaces eth0 and ib0 to detect a failed node
>> bcast eth0 bond0
>> # Use manual failback
>> auto_failback off
>> # node names in this failover-pair. These names must match the
>> # output of `hostname`
>> node ib3-st01
>> node ib3-st02
>> node ib3-st03
>> node ib3-st04
>>
>> [root@ib3-st02 ~]# cat /etc/ha.d/haresources
>> ib3-st01 Filesystem::/dev/emcssd-1/mdt-sata::/mnt/mdt-colosse::lustre
>> ib3-st01
>> Filesystem::/dev/mpath/colosse4-lun53-sata::/mnt/data/clun53::lustre
>> ib3-st02
>> Filesystem::/dev/mpath/colosse4-lun54-sata::/mnt/data/clun54::lustre
>> ib3-st03
>> Filesystem::/dev/mpath/colosse4-lun55-sata::/mnt/data/clun55::lustre
>> ib3-st04
>> Filesystem::/dev/mpath/colosse4-lun56-sata::/mnt/data/clun56::lustre
>> ib3-st01
>> Filesystem::/dev/mpath/colosse4-lun57-sata::/mnt/data/clun57::lustre
>> ib3-st02
>> Filesystem::/dev/mpath/colosse4-lun58-sata::/mnt/data/clun58::lustre
>> ib3-st03
>> Filesystem::/dev/mpath/colosse4-lun59-sata::/mnt/data/clun59::lustre
>> ib3-st04
>> Filesystem::/dev/mpath/colosse4-lun60-sata::/mnt/data/clun60::lustre
>>
>>
>> It is all the same on all OSS's.
>>
>> Does anybody ever encounter  that problem?
>> Thanks for help.
>>
>>
>>
>>
>
>
> 
> 
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> 
> 
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Errors on mounting clients

2011-12-22 Thread Patrice Hamelin
Hi,

   Me again!  :-)

I am getting errors before being able to mount clients.

On o2ib networks, I tried 3 or 4 clients which behave the same: error on 
first mount and second try mounts.

ib3-bc3e41-be02:~# mount -t lustre ib3-st01s@o2ib3:ib3-st02s@o2ib3:/sata 
/mnt/sata
mount.lustre: mount ib3-st01s@o2ib3:ib3-st02s@o2ib3:/sata at /mnt/sata 
failed: Cannot send after transport endpoint shutdown
ib3-bc3e41-be02:~# mount -t lustre ib3-st01s@o2ib3:ib3-st02s@o2ib3:/sata 
/mnt/sata

Generating a network error in the logs:

Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370607] Lustre: 
2671:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request 
x1388634652055195 sent from MGC10.10.135.115@o2ib3 to NID 
10.10.135.115@o2ib3 0s ago has failed due to network error (5s prior to 
deadline).
Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370611]   
req@8804917c4c00 x1388634652055195/t0 
o250->MGS@MGC10.10.135.115@o2ib3_0:26/25 lens 368/584 e 0 to 1 dl 
1324569206 ref 1 fl Rpc:N/0/0 rc 0/0
Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370617] Lustre: 
2671:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 108 previous 
similar messages

   On my TCP clients, it is a little bit different:

ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata 
failed: Cannot send after transport endpoint shutdown
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e@tcp:ib3-st02e@tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e@tcp:ib3-st02e@tcp:/sata at /mnt/sata 
failed: File exists
.
.
.
   Then finally mounts after several tries.

Log files shows a network error once again:

Dec 22 15:59:43 ib4-bc1f82-be13 kernel: [172481.077865] Lustre: 
MGC10.10.132.115@tcp: Reactivating import
Dec 22 15:59:43 ib4-bc1f82-be13 kernel: [172481.087738] Lustre: 
3057:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request 
x1388731500687365 sent from sata-OST0004-osc-88096cd98000 to NID 
10.10.132.111@tcp 0s ago has failed due to network error (5s prior to 
deadline).

I know it is related to network, but my own network works just fine.  
What about lnet?  How can I explain/eliminate that problem?


Thanks!
Greetings!



-- 
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Heartbeat problem

2011-12-22 Thread Patrice Hamelin
Hi,

   I have a heartbeat problem while trying automatic failover.  Manual 
failover works great, unmounting a  partitition from an OSS and 
remounting it on another one makes the clients recover.  It all starts 
with this error:

Filesystem[7650]:   2011/12/22_14:36:05 ERROR: Couldn't mount 
filesystem /dev/mpath/colosse4-lun60-sata on /mnt/data/clun60
Filesystem[7639]:   2011/12/22_14:36:05 ERROR:  Generic error

   As a result, the failover OSS is the wrong one and the clients stays 
in this state forever:

sata-OST_UUID   : Resource temporarily unavailable

   Here is my heartbeat config:

[root@ib3-st02 ~]# cat /etc/ha.d/ha.cf
# log file settings
# write debug output to /var/log/ha-debug
debugfile /var/log/ha-debug
# write log messages to /var/log/ha-log
logfile /var/log/ha-log
# use syslog to write to logfiles
logfacility local0
# set some time-outs. these values are only recommendations, which
# depend e.g. on the OSS load
# send keep-alive packages every 2 seconds
keepalive 2
# wait 90 seconds before declaring a node dead
deadtime 90
# write a warning to the logfile after 30 seconds without an answer
# from the failover node
warntime 30
# wait for 120 seconds before declaring a node dead after heartbeat
# is brought up
initdead 120
# define communication channels
# use port 12345 to communicate with fail-over node
udpport 12345
# use network interfaces eth0 and ib0 to detect a failed node
bcast eth0 bond0
# Use manual failback
auto_failback off
# node names in this failover-pair. These names must match the
# output of `hostname`
node ib3-st01
node ib3-st02
node ib3-st03
node ib3-st04

[root@ib3-st02 ~]# cat /etc/ha.d/haresources
ib3-st01 Filesystem::/dev/emcssd-1/mdt-sata::/mnt/mdt-colosse::lustre
ib3-st01 
Filesystem::/dev/mpath/colosse4-lun53-sata::/mnt/data/clun53::lustre
ib3-st02 
Filesystem::/dev/mpath/colosse4-lun54-sata::/mnt/data/clun54::lustre
ib3-st03 
Filesystem::/dev/mpath/colosse4-lun55-sata::/mnt/data/clun55::lustre
ib3-st04 
Filesystem::/dev/mpath/colosse4-lun56-sata::/mnt/data/clun56::lustre
ib3-st01 
Filesystem::/dev/mpath/colosse4-lun57-sata::/mnt/data/clun57::lustre
ib3-st02 
Filesystem::/dev/mpath/colosse4-lun58-sata::/mnt/data/clun58::lustre
ib3-st03 
Filesystem::/dev/mpath/colosse4-lun59-sata::/mnt/data/clun59::lustre
ib3-st04 
Filesystem::/dev/mpath/colosse4-lun60-sata::/mnt/data/clun60::lustre


   It is all the same on all OSS's.

Does anybody ever encounter  that problem?
Thanks for help.




-- 
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] failover on multihomed clusters

2011-12-21 Thread Patrice Hamelin

Hi,

  If you refer to my previous message, you will see that I have two 
multihomed clusters, each having Lustre servers and  clients.  I have 
clients mounting lustre partitions from o2ib and tcp.  Now I am 
inplementing failover, did a try this morning without success, so RTFM.  
I read:


Note -- If you have an MGS or MDT configured for failover, perform these 
steps:

1. On the OST, list the NIDs of all MGS nodes at mkfs time.
OST# mkfs.lustre --fsname sunfs --ost --mgsnode=10.0.0.1
--mgsnode=10.0.0.2 /dev/{device}
2. On the client, mount the file system.
client# mount -t lustre 10.0.0.1:10.0.0.2:/sunfs /cfs/client/

So I extended the logic from :

mkfs.lustre --mgs --mdt --fsname=sata --failnode=ib3-st02s@o2ib3 
<mailto:--failnode%3Dib4-st02s@o2ib4> --reformat /dev/mpath/emcssd-1
mkfs.lustre --fsname sata --reformat --ost --mgsnode=ib3-st01s@o2ib3 
--mgsnode=ib3-st01e@tcp --failnode=ib3-st02s@o2ib3 
<mailto:--failnode%3Dib4-st02s@o2ib4> /dev/mpath/colosse4-lun54-sata


to:

 mkfs.lustre --mgs --mdt --fsname=sata 
--failnode=ib3-st02s@o2ib3,ib3-st02e@tcp --reformat /dev/mpath/emcssd-1
mkfs.lustre --fsname sata --reformat --ost 
--mgsnode=ib3-st01s@o2ib3,ib3-st01e@tcp 
--mgsnode=ib3-st02s@o2ib3,ib3-st02e@tcp 
--failnode=ib3-st02s@o2ib3,ib3-st02e@tcp  /dev/mpath/colosse4-lun53-sata


And so on for other  disks.

Partitions mounts great on the MDS/MGS/OSS server, but on the OSS only, 
I have:


[root@ib3-st03 ~]# mount -t lustre /dev/mpath/colosse4-lun55-sata 
/mnt/data/clun55
mount.lustre: mount /dev/mpath/colosse4-lun55-sata at /mnt/data/clun55 
failed: Interrupted system call


messages file contains:

Dec 21 15:18:52 ib3-st03 kernel: Lustre: 
9464:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request 
x1388814699331655 sent from MGC10.10.135.115@o2ib3 to NID 
10.10.135.116@o2ib3 5s ago has timed out (5s prior to deadline).
Dec 21 15:18:52 ib3-st03 kernel:   req@810116fff800 
x1388814699331655/t0 o250->MGS@MGC10.10.135.115@o2ib3_1:26/25 lens 
368/584 e 0 to 1 dl 1324480732 ref 1 fl Rpc:N/0/0 rc 0/0
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:1112:server_start_targets()) Required registration 
failed for sata-OST: -4
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:1670:server_fill_super()) Unable to start targets: -4
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:1453:server_put_super()) no obd sata-OST
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:147:server_deregister_mount()) sata-OST not 
registered

Dec 21 15:18:52 ib3-st03 kernel: Lustre: server umount sata-OST complete
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 
23519:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount  (-4)



so my question is?

What would ne the correct syntax to make sure I have a failover on the 
o2ib clients as well as the tcp clients?


Thanks




--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] two multi-homed cluster

2011-12-19 Thread Patrice Hamelin

Hi,

  I have two Infiniband clusters, each in a separate location with a 
solid ethernet connectivity between each of them.  Say they are named 
cluster A and cluster B.  All members of each clusters have both IB and 
eth networks available to them, and the IB network is not routed between 
cluster A and B, but ethernet is.  On each clusters, I have 4 OSS's 
serving FC disks.  Clients on cluster A mounts Lustre disk from their 
local cluster, and the same goes on for for cluster B, both on 
Infiniband NIDs.


  What I would like to achieve is client from cluster A to mount disks 
from OSS's on cluster B on the ethernet connection.  The same goes on 
for clients in cluster B to mount disks from OSS's on cluster A.


  From my readings in the luster 1.8.7 manual, I got:

7.1.1 Modprobe.conf
Options under modprobe.conf are used to specify the networks available 
to a node.
You have the choice of two different options -- the networks option, 
which explicitly
lists the networks available and the ip2nets option, which provides a 
list-matching
lookup. Only one option can be used at any one time. The order of LNET 
lines in
modprobe.conf is important when configuring multi-homed servers. *If a 
server
node can be reached using more than one network, the first network 
specified in

modprobe.conf will be used.*

Is the last sentence means that I cannot do that?

Thanks.

--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Gouvernement du Canada | Government of Canada
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] two multi-homed cluster

2011-12-19 Thread Patrice Hamelin

OK!  Found the solution (came from a Luster user).  So simple!...


Quote:
---
I think the possible solution to your problem lies in differentiating 
the two different IB networks - by changing the lustre lnet device names.
This means that each separate cluster would have different non-default 
"o2ib" naming convention in modprobe.conf.


The IB3 lustre servers might call it:

   options lnet networks="o2ib3(bond0),tcp(eth0)"

and the IB4 lustre servers might call it:

   options lnet networks="o2ib4(bond0),tcp(eth0)"

---

That solution works perfectly.

Thanks to repliers!

Season's Greetings all!

On 12/19/11 12:57, Patrice Hamelin wrote:

Cliff,

  Maybe our configuration is a bit special.  We are running two 
Infiniband partitions, one for storage and the other for TCP over IB.  
Both clusters are named IB3 and IB4.


I have 4 OSS on clustre IB3 which are configured like:

bond0 Link encap:InfiniBand  HWaddr 
80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

  inet addr:10.10.135.115  Bcast:10.10.135.255  Mask:255.255.255.0
  inet6 addr: fe80::202:c903:e:8bc6/64 Scope:Link
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
  RX packets:6 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:336 (336.0 b)  TX bytes:0 (0.0 b)

eth0  Link encap:Ethernet  HWaddr E4:1F:13:60:93:C0
  inet addr:10.10.132.115  Bcast:10.10.132.255  Mask:255.255.255.0
  inet6 addr: fe80::e61f:13ff:fe60:93c0/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:85 errors:0 dropped:0 overruns:0 frame:0
  TX packets:91 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:10707 (10.4 KiB)  TX bytes:10607 (10.3 KiB)
  Interrupt:169 Memory:9200-92012800

ib0.8001  Link encap:InfiniBand  HWaddr 
80:00:00:4A:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:65520  Metric:1
  RX packets:3 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:168 (168.0 b)  TX bytes:0 (0.0 b)

ib1.8001  Link encap:InfiniBand  HWaddr 
80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:65520  Metric:1
  RX packets:3 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:168 (168.0 b)  TX bytes:0 (0.0 b)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:8 errors:0 dropped:0 overruns:0 frame:0
  TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:560 (560.0 b)  TX bytes:560 (560.0 b)

[root@ib3-st01 ~]# cat /etc/modprobe.conf
alias eth0 bnx2
alias eth1 bnx2
alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptsas
alias scsi_hostadapter2 ata_piix
alias scsi_hostadapter3 qla2xxx
alias usb0 cdc_ether
alias bond0 bonding
options bond0 miimon=100 mode=1
options lnet networks="o2ib(bond0),tcp(eth0)"
options ost oss_num_threads=24

I formatted the MGS/MDT like:

mkfs.lustre --mgs --mdt --fsname=sata --reformat /dev/mpath/emcssd-1

And the 8 OST's like:

mkfs.lustre --fsname sata --reformat --ost 
--mgsnode=10.10.135.115@o2ib --mgsnode=10.10.132.115@tcp 
/dev/mpath/colosse4-lun53-sata



[root@ib3-st01 ~]# cat /etc/ha.d/haresources
ib3-st01 Filesystem::/dev/mpath/emcssd-1::/mnt/mdt-colosse::lustre
ib3-st01 
Filesystem::/dev/mpath/colosse4-lun53-sata::/mnt/data/clun53::lustre
ib3-st02 
Filesystem::/dev/mpath/colosse4-lun54-sata::/mnt/data/clun54::lustre
ib3-st03 
Filesystem::/dev/mpath/colosse4-lun55-sata::/mnt/data/clun55::lustre
ib3-st04 
Filesystem::/dev/mpath/colosse4-lun56-sata::/mnt/data/clun56::lustre
ib3-st01 
Filesystem::/dev/mpath/colosse4-lun57-sata::/mnt/data/clun57::lustre
ib3-st02 
Filesystem::/dev/mpath/colosse4-lun58-sata::/mnt/data/clun58::lustre
ib3-st03 
Filesystem::/dev/mpath/colosse4-lun59-sata::/mnt/data/clun59::lustre
ib3-st04 
Filesystem::/dev/mpath/colosse4-lun60-sata::/mnt/data/clun60::lustre


[root@ib3-st01 ~]# lctl list_nids
10.10.135.115@o2ib
10.10.132.115@tcp

service heartbeat start


Client on cluster IB3
ib3-bc3e41-be01:~# ifconfig
ib0.8001  Link encap:UNSPEC  HWaddr 
80-00-00-51-FE-80-00-00-00-00-00-00-00-00-00-00

  inet addr:10.10.135.74  Bcast:10.10.135.255  Mask:255.255.255.0
  inet6 addr: fe80::224:e890:97fe:fc91/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
  RX packets:5580 errors:0 dropped:0 overruns

Re: [Lustre-discuss] two multi-homed cluster

2011-12-19 Thread Patrice Hamelin
ould have
options lnet networks="tcp0(eth0),o2ib0(ib0)"

The IB clients will mount using a @o2ib0 NID, and the ethernet clients 
will mount using @tcp0 NIDs. Since you are explicitly specifying the 
network, the hop rule doesn't apply.

cliffw


On Fri, Dec 16, 2011 at 9:49 AM, Patrice Hamelin 
mailto:patrice.hame...@ec.gc.ca>> wrote:


Hi,

  I have two Infiniband clusters, each in a separate location with
a solid ethernet connectivity between each of them.  Say they are
named cluster A and cluster B.  All members of each clusters have
both IB and eth networks available to them, and the IB network is
not routed between cluster A and B, but ethernet is.  On each
clusters, I have 4 OSS's serving FC disks.  Clients on cluster A
mounts Lustre disk from their local cluster, and the same goes on
for for cluster B, both on Infiniband NIDs.

  What I would like to achieve is client from cluster A to mount
disks from OSS's on cluster B on the ethernet connection.  The
same goes on for clients in cluster B to mount disks from OSS's on
cluster A.

  From my readings in the luster 1.8.7 manual, I got:

7.1.1 Modprobe.conf
Options under modprobe.conf are used to specify the networks
available to a node.
You have the choice of two different options – the networks
option, which explicitly
lists the networks available and the ip2nets option, which
provides a list-matching
lookup. Only one option can be used at any one time. The order of
LNET lines in
modprobe.conf is important when configuring multi-homed servers.
*If a server
node can be reached using more than one network, the first network
specified in
modprobe.conf will be used.*

Is the last sentence means that I cannot do that?

Thanks.

-- 
Patrice Hamelin

Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Gouvernement du Canada | Government of Canada


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
<mailto:Lustre-discuss@lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss




--
cliffw
Support Guy
WhamCloud, Inc.
www.whamcloud.com <http://www.whamcloud.com>




--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] two multi-homed cluster

2011-12-16 Thread Patrice Hamelin

Hi,

  I have two Infiniband clusters, each in a separate location with a 
solid ethernet connectivity between each of them.  Say they are named 
cluster A and cluster B.  All members of each clusters have both IB and 
eth networks available to them, and the IB network is not routed between 
cluster A and B, but ethernet is.  On each clusters, I have 4 OSS's 
serving FC disks.  Clients on cluster A mounts Lustre disk from their 
local cluster, and the same goes on for for cluster B, both on 
Infiniband NIDs.


  What I would like to achieve is client from cluster A to mount disks 
from OSS's on cluster B on the ethernet connection.  The same goes on 
for clients in cluster B to mount disks from OSS's on cluster A.


  From my readings in the luster 1.8.7 manual, I got:

7.1.1 Modprobe.conf
Options under modprobe.conf are used to specify the networks available 
to a node.
You have the choice of two different options -- the networks option, 
which explicitly
lists the networks available and the ip2nets option, which provides a 
list-matching
lookup. Only one option can be used at any one time. The order of LNET 
lines in
modprobe.conf is important when configuring multi-homed servers. *If a 
server
node can be reached using more than one network, the first network 
specified in

modprobe.conf will be used.*

Is the last sentence means that I cannot do that?

Thanks.

--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Gouvernement du Canada | Government of Canada

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss