Re: [Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)

2014-08-19 Thread Andreas Dilger
Often this problem is because the hostname in /etc/hosts is actually mapped to 
localhost on the node itself. 

Unfortunately, this is how some systems are set up by default. 

Cheers, Andreas

> On Aug 19, 2014, at 12:39, "Abhay Dandekar"  wrote:
> 
> I came across a similar situation.
> 
> Below is the log of machine state. These steps worked on some setups while on 
> some it didnt.
> 
> Armaan,
> 
> Were you able to get over the problem ? Any workaround ?
> 
> Thanks in advance for all your help.
> 
> 
> Warm Regards,
> Abhay Dandekar
> 
> 
> -- Forwarded message --
> From: Abhay Dandekar 
> Date: Wed, Aug 6, 2014 at 12:18 AM
> Subject: Lustre configuration failure : lwp-MDT: Communicating with 0@lo, 
> operation mds_connect failed with -11.
> To: lustre-discuss@lists.lustre.org
> 
> 
> 
> Hi All,
> 
> I have come across an lustre installation failure where the MGS is always 
> trying to reach "lo" config instead of configured ethernet.
> 
> These same steps worked on a different machine, somehow they are failing here.
> 
> Here are the logs 
> 
> Lustre installation is success with all the packages installed without any 
> error.
> 
> 0. Lustre version 
> 
> Aug  5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1
> Aug  5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel 
> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
>  No such device
> Aug  5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table)
> Aug  5 23:07:37 lfs-server kernel: alg: No test for adler32 (adler32-zlib)
> Aug  5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha 
> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
>  No such device
> Aug  5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not 
> detected.
> Aug  5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version: 
> 2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64
> Aug  5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50@tcp 
> [8/256/0/180]
> Aug  5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988
> 
> 
> 1. Mkfs
> 
> [root@lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0 
> /dev/sdb 
> 
>Permanent disk data:
> Target: lustre:MDT
> Index:  0
> Lustre FS:  lustre
> Mount type: ldiskfs
> Flags:  0x65
>   (MDT MGS first_time update )
> Persistent mount opts: user_xattr,errors=remount-ro
> Parameters:
> 
> checking for existing Lustre data: not found
> device size = 10240MB
> formatting backing filesystem ldiskfs on /dev/sdb
> target name  lustre:MDT
> 4k blocks 2621440
> options-J size=400 -I 512 -i 2048 -q -O 
> dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E 
> lazy_journal_init -F
> mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT  -J size=400 -I 512 -i 2048 -q 
> -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E 
> lazy_journal_init -F /dev/sdb 2621440
> Aug  5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with 
> ordered data mode. quota=on. Opts: 
> Writing CONFIGS/mountdata
> [root@lfs-server ~]#
> 
> 2. Mount
> 
> [root@lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs 
> Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with 
> ordered data mode. quota=on. Opts: 
> Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem with 
> ordered data mode. quota=on. Opts: 
> Aug  5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT: No data found 
> on store. Initialize space
> Aug  5 17:18:02 lfs-server kernel: Lustre: lustre-MDT: new disk, 
> initializing
> Aug  5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname received: 
> params
> Aug  5 17:18:02 lfs-server kernel: LustreError: 11-0: 
> lustre-MDT-lwp-MDT: Communicating with 0@lo, operation mds_connect 
> failed with -11.
> [root@lfs-server ~]# 
> 
> 
> 3. Unmount
> [root@lfs-server ~]# umount /dev/sdb 
> Aug  5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT
> Aug  5 17:19:52 lfs-server kernel: Lustre: 
> 1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has timed 
> out for slow reply: [sent 1407239386/real 1407239386]  req@88003d795c00 
> x1475596948340888/t0(0) o251->MGC192.168.122.50@tcp@0@lo:26/25 lens 224/224 e 
> 0 to 1 dl 1407239392 ref 2 fl Rpc:XN/0/ rc 0/-1
> [root@lfs-server ~]# Aug  5 17:19:53 lfs-server kernel: Lustre: server umount 
> lustre-MDT complete
> 
> [root@lfs-server ~]# 
> 
> 
> 4. [root@mgs ~]# cat /etc/modprobe.d/lustre.conf 
> options lnet networks=tcp(eth0)
> [root@mgs ~]# 
> 
> 5.Even the lnet configuration is in place, it does not pick up the required 
> eth0.
> 
> [root@mgs ~]# lctl dl 
>   0 UP osd-ldiskfs lustre-MDT-osd lustre-MDT-osd_UUID 8
>   1 UP mgs MGS MGS 5
>   2 UP mgc MGC192.168.122.50@tcp c6ea84c0-b3b2-9d25-8126-32d85956ae4d 5
>   3 UP mds MDS

Re: [Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)

2014-08-20 Thread Abhay Dandekar
Hi Andreas,

Sorry for bothering you, but modifying the /etc/hosts still does not solve
the problem.

Just to give some more info, I am trying to setup a virtual cluster of
lustre nodes.

Here is my /etc/hosts

[root@mgs-new-test ~]# cat /etc/hosts
192.168.122.50mgs-new-test
192.168.122.50   localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost localhost.localdomain localhost6
localhost6.localdomain6
::0mgs-new-test
[root@mgs-new-test ~]#

And here is the latest /var/log/messages

Aug 20 11:32:32 mgs-new-test kernel: EXT4-fs (vda1): mounted filesystem
with ordered data mode. Opts:
Aug 20 11:32:32 mgs-new-test kernel: Adding 417784k swap on
/dev/mapper/vg_mgsnewtest-lv_swap.  Priority:-1 extents:1 across:417784k
Aug 20 11:32:32 mgs-new-test kernel: NET: Registered protocol family 10
Aug 20 11:32:32 mgs-new-test kernel: lo: Disabled Privacy Extensions
Aug 20 11:33:25 mgs-new-test kernel: LNet: HW CPU cores: 1, npartitions: 1
Aug 20 11:33:25 mgs-new-test kernel: alg: No test for adler32 (adler32-zlib)
Aug 20 11:33:25 mgs-new-test kernel: alg: No test for crc32 (crc32-table)
Aug 20 11:33:29 mgs-new-test modprobe: FATAL: Error inserting padlock_sha
(/lib/modules/2.6.32-431.20.3.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
No such device
Aug 20 11:33:29 mgs-new-test kernel: padlock: VIA PadLock Hash Engine not
detected.
Aug 20 11:33:33 mgs-new-test kernel: Lustre: Lustre: Build Version:
2.6.0-RC2--PRISTINE-2.6.32-431.20.3.el6_lustre.x86_64
Aug 20 11:33:33 mgs-new-test kernel: LNet: Added LNI 192.168.122.50@tcp
[8/256/0/180]
Aug 20 11:33:33 mgs-new-test kernel: LNet: Accept secure, port 988
Aug 20 11:34:41 mgs-new-test kernel: LDISKFS-fs (vdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug 20 11:34:53 mgs-new-test kernel: LDISKFS-fs (vdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug 20 11:34:54 mgs-new-test kernel: LDISKFS-fs (vdb): mounted filesystem
with ordered data mode. quota=on. Opts:
Aug 20 11:34:54 mgs-new-test kernel: Lustre: ctl-mylustre-MDT: No data
found on store. Initialize space
Aug 20 11:34:54 mgs-new-test kernel: Lustre: mylustre-MDT: new disk,
initializing
Aug 20 11:34:54 mgs-new-test kernel: LustreError: 11-0:
*mylustre-MDT-lwp-MDT:
Communicating with 0@lo, operation mds_connect failed with -11.*


Any pointers where else do I need to make the changes ?

Thanks in advance.


Warm Regards,
Abhay Dandekar



> On Wed, Aug 20, 2014 at 3:23 AM, Andreas Dilger  wrote:
>
>> Often this problem is because the hostname in /etc/hosts is actually
>> mapped to localhost on the node itself.
>>
>> Unfortunately, this is how some systems are set up by default.
>>
>> Cheers, Andreas
>>
>> On Aug 19, 2014, at 12:39, "Abhay Dandekar" 
>> wrote:
>>
>> I came across a similar situation.
>>
>> Below is the log of machine state. These steps worked on some setups
>> while on some it didnt.
>>
>> Armaan,
>>
>> Were you able to get over the problem ? Any workaround ?
>>
>> Thanks in advance for all your help.
>>
>>
>> Warm Regards,
>> Abhay Dandekar
>>
>>
>> -- Forwarded message --
>> From: Abhay Dandekar 
>> Date: Wed, Aug 6, 2014 at 12:18 AM
>> Subject: Lustre configuration failure : lwp-MDT: Communicating with
>> 0@lo, operation mds_connect failed with -11.
>> To: lustre-discuss@lists.lustre.org
>>
>>
>>
>> Hi All,
>>
>> I have come across an lustre installation failure where the MGS is always
>> trying to reach "lo" config instead of configured ethernet.
>>
>> These same steps worked on a different machine, somehow they are failing
>> here.
>>
>> Here are the logs
>>
>> Lustre installation is success with all the packages installed without
>> any error.
>>
>> 0. Lustre version
>>
>> Aug  5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1
>> Aug  5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel
>> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
>> No such device
>> Aug  5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table)
>> Aug  5 23:07:37 lfs-server kernel: alg: No test for adler32 (adler32-zlib)
>> Aug  5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha
>> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
>> No such device
>> Aug  5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not
>> detected.
>> Aug  5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version:
>> 2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64
>> Aug  5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50@tcp
>> [8/256/0/180]
>> Aug  5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988
>>
>>
>> 1. Mkfs
>>
>> [root@lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0
>> /dev/sdb
>>
>>Permanent disk data:
>> Target: lustre:MDT
>> Index:  0
>> Lustre FS:  lustre
>> Mount type: ldiskfs
>> Flags:

Re: [Lustre-discuss] RE : Lustre-2.4 VMs (EL6.4)

2014-09-01 Thread Abhay Dandekar
Thanks for replying back Arman.


/var/log/messages still cribbs about the error as below :
Aug 29 15:01:59 MGS-1 kernel: LustreError: 11-0:
lustre-MDT-lwp-MDT: Communicating with 0@lo, operation mds_connect
failed with -11.

but, adding a mapping in /etc/hosts allows others to connect to MGS now.

Seems like a workaround, but things are working as of now. It still fails
if you try to configure mdt with an IP.

Thanks again.



Warm Regards,
Abhay Dandekar


On Mon, Aug 25, 2014 at 5:00 PM, Arman Khalatyan  wrote:

> Hi Abhay,
> Could you please check the lnet status?
> lctl list_nids, or pings..
> Is you firewall enabled?
> BTW, i move all my servers to 2.5.x branch, that was fixing most of my
> troubles...
> a.
>
>
> On Tue, Aug 19, 2014 at 12:38 PM, Abhay Dandekar
>  wrote:
> > I came across a similar situation.
> >
> > Below is the log of machine state. These steps worked on some setups
> while
> > on some it didnt.
> >
> > Armaan,
> >
> > Were you able to get over the problem ? Any workaround ?
> >
> > Thanks in advance for all your help.
> >
> >
> > Warm Regards,
> > Abhay Dandekar
> >
> >
> > -- Forwarded message --
> > From: Abhay Dandekar 
> > Date: Wed, Aug 6, 2014 at 12:18 AM
> > Subject: Lustre configuration failure : lwp-MDT: Communicating with
> > 0@lo, operation mds_connect failed with -11.
> > To: lustre-discuss@lists.lustre.org
> >
> >
> >
> > Hi All,
> >
> > I have come across an lustre installation failure where the MGS is always
> > trying to reach "lo" config instead of configured ethernet.
> >
> > These same steps worked on a different machine, somehow they are failing
> > here.
> >
> > Here are the logs
> >
> > Lustre installation is success with all the packages installed without
> any
> > error.
> >
> > 0. Lustre version
> >
> > Aug  5 23:07:37 lfs-server kernel: LNet: HW CPU cores: 1, npartitions: 1
> > Aug  5 23:07:37 lfs-server modprobe: FATAL: Error inserting crc32c_intel
> >
> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko):
> > No such device
> > Aug  5 23:07:37 lfs-server kernel: alg: No test for crc32 (crc32-table)
> > Aug  5 23:07:37 lfs-server kernel: alg: No test for adler32
> (adler32-zlib)
> > Aug  5 23:07:41 lfs-server modprobe: FATAL: Error inserting padlock_sha
> >
> (/lib/modules/2.6.32-431.17.1.el6_lustre.x86_64/kernel/drivers/crypto/padlock-sha.ko):
> > No such device
> > Aug  5 23:07:41 lfs-server kernel: padlock: VIA PadLock Hash Engine not
> > detected.
> > Aug  5 23:07:45 lfs-server kernel: Lustre: Lustre: Build Version:
> > 2.5.2-RC2--PRISTINE-2.6.32-431.17.1.el6_lustre.x86_64
> > Aug  5 23:07:45 lfs-server kernel: LNet: Added LNI 192.168.122.50@tcp
> > [8/256/0/180]
> > Aug  5 23:07:45 lfs-server kernel: LNet: Accept secure, port 988
> >
> >
> > 1. Mkfs
> >
> > [root@lfs-server ~]# mkfs.lustre --fsname=lustre --mgs --mdt --index=0
> > /dev/sdb
> >
> >Permanent disk data:
> > Target: lustre:MDT
> > Index:  0
> > Lustre FS:  lustre
> > Mount type: ldiskfs
> > Flags:  0x65
> >   (MDT MGS first_time update )
> > Persistent mount opts: user_xattr,errors=remount-ro
> > Parameters:
> >
> > checking for existing Lustre data: not found
> > device size = 10240MB
> > formatting backing filesystem ldiskfs on /dev/sdb
> > target name  lustre:MDT
> > 4k blocks 2621440
> > options-J size=400 -I 512 -i 2048 -q -O
> > dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
> > lazy_journal_init -F
> > mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT  -J size=400 -I 512 -i
> 2048
> > -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
> > lazy_journal_init -F /dev/sdb 2621440
> > Aug  5 17:16:47 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
> with
> > ordered data mode. quota=on. Opts:
> > Writing CONFIGS/mountdata
> > [root@lfs-server ~]#
> >
> > 2. Mount
> >
> > [root@lfs-server ~]# mount -t lustre /dev/sdb /mnt/mgs
> > Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
> with
> > ordered data mode. quota=on. Opts:
> > Aug  5 17:18:01 lfs-server kernel: LDISKFS-fs (sdb): mounted filesystem
> with
> > ordered data mode. quota=on. Opts:
> > Aug  5 17:18:02 lfs-server kernel: Lustre: ctl-lustre-MDT: No data
> found
> > on store. Initialize space
> > Aug  5 17:18:02 lfs-server kernel: Lustre: lustre-MDT: new disk,
> > initializing
> > Aug  5 17:18:02 lfs-server kernel: Lustre: MGS: non-config logname
> received:
> > params
> > Aug  5 17:18:02 lfs-server kernel: LustreError: 11-0:
> > lustre-MDT-lwp-MDT: Communicating with 0@lo, operation
> mds_connect
> > failed with -11.
> > [root@lfs-server ~]#
> >
> >
> > 3. Unmount
> > [root@lfs-server ~]# umount /dev/sdb
> > Aug  5 17:19:46 lfs-server kernel: Lustre: Failing over lustre-MDT
> > Aug  5 17:19:52 lfs-server kernel: Lustre:
> > 1338:0:(client.c:1908:ptlrpc_expire_one_request()) @@@ Request sent has
> > timed out for slow reply