Hello All,
I am new to Lustre. I started by using the docs on this page to deploy Lustre on Virtual machines running CentOS 7.x (CentOS-7-2018.08.15-0). Included below are the content of the scripts I used and the error I get. I have not done any setup for “o2ib0(ib0)” and lnet is using tcp. All the nodes are on the same network & subnet and cannot communicate on my protocol and port #. Thanks for your help. I am completely blocked and looking for ideas. (already did google search ☹). I have 2 questions: The MDT mounted on MDS has no permissions (no read , no write, no execute), even for root user on MDS/MGS node. Is that expected? . See “MGS/MDS node setup” section for more details on what I did. [root@lustre-mds-server-1 opc]# mount -t lustre /dev/sdb /mnt/mdt [root@lustre-mds-server-1 opc]# ll /mnt total 0 d---------. 1 root root 0 Jan 1 1970 mdt [root@lustre-mds-server-1 opc]# Assuming if the above is not an issue, after setting up OSS/OST and Client node, When my client tries to mount, I get the below error: [root@lustre-client-1 opc]# mount -t lustre 10.0.2.4@tcp:/lustrewt /mnt mount.lustre: mount 10.0.2.4@tcp:/lustrewt at /mnt failed: Input/output error Is the MGS running? [root@lustre-client-1 opc]# dmesg shows the below error on the client node: [root@lustre-client-1 opc]# dmesg [35639.535862] Lustre: 11730:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549386846/real 1549386846] req@ffff9259bb518c00 x1624614953288208/t0(0) o250->MGC10.0.2.4@tcp@10.0.2.4@tcp:26/25 lens 520/544 e 0 to 1 dl 1549386851 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [35640.535877] LustreError: 7718:0:(mgc_request.c:251:do_config_log_add()) MGC10.0.2.4@tcp: failed processing log, type 1: rc = -5 [35669.535028] Lustre: 11730:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549386871/real 1549386871] req@ffff9259bb428f00 x1624614953288256/t0(0) o250->MGC10.0.2.4@tcp@10.0.2.4@tcp:26/25 lens 520/544 e 0 to 1 dl 1549386881 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [35670.546671] LustreError: 15c-8: MGC10.0.2.4@tcp: The configuration from log 'lustrewt-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [35670.557472] Lustre: Unmounted lustrewt-client [35670.560432] LustreError: 7718:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount (-5) [root@lustre-client-1 opc]# I have firewall turned off on all nodes (client, mds/mgs, oss), selinux is disabled/setenforce=0 . I can telnet to the MDS/MGS node from client machine. Given below is the setup I have on different nodes: MGS/MDS node setup #!/bin/bash service firewalld stop chkconfig firewalld off cat > /etc/yum.repos.d/lustre.repo << EOF [hpddLustreserver] name=CentOS- - Lustre baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7.6.1810/server/ gpgcheck=0 [e2fsprogs] name=CentOS- - Ldiskfs baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/ gpgcheck=0 [hpddLustreclient] name=CentOS- - Lustre baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7.6.1810/client/ gpgcheck=0 EOF sudo yum install lustre-tests -y cp /etc/selinux/config /etc/selinux/config.backup sed 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config setenforce 0 echo "complete. rebooting now" reboot After reboot is complete, I login to the MGS/MDS node as root and run the following steps: The node has a block storage device attached: /dev/sdb Run the below command: pvcreate -y /dev/sdb mkfs.xfs -f /dev/sdb [root@lustre-mds-server-1 opc]# setenforce 0 [root@lustre-mds-server-1 opc]# mkfs.lustre --fsname=lustrewt --index=0 --mgs --mdt /dev/sdb Permanent disk data: Target: lustrewt:MDT0000 Index: 0 Lustre FS: lustrewt Mount type: ldiskfs Flags: 0x65 (MDT MGS first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: checking for existing Lustre data: not found device size = 51200MB formatting backing filesystem ldiskfs on /dev/sdb target name lustrewt:MDT0000 4k blocks 13107200 options -J size=2048 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F mkfs_cmd = mke2fs -j -b 4096 -L lustrewt:MDT0000 -J size=2048 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sdb 13107200 [root@lustre-mds-server-1 opc]# mkdir -p /mnt/mdt [root@lustre-mds-server-1 opc]# mount -t lustre /dev/sdb /mnt/mdt [root@lustre-mds-server-1 opc]# modprobe lnet [root@lustre-mds-server-1 opc]# lctl network up LNET configured [root@lustre-mds-server-1 opc]# lctl list_nids 10.0.2.4@tcp [root@lustre-mds-server-1 opc]# ll /mnt total 0 d---------. 1 root root 0 Jan 1 1970 mdt [root@lustre-mds-server-1 opc]# OSS/OST node 1 OSS node with 1 block device for OST (/dev/sdb). The setup to update kernel was the same as MGS/MDS node (described above), then I ran the below commands: mkfs.lustre --ost --fsname=lustrewt --index=0 --mgsnode=10.0.2.4@tcp /dev/sdb mkdir -p /ostoss_mount mount -t lustre /dev/sdb /ostoss_mount Client node 1 client node. The setup to update kernel was the same as MGS/MDS node (described above), then I ran the below commands: [root@lustre-client-1 opc]# modprobe lustre [root@lustre-client-1 opc]# mount -t lustre 10.0.2.3@tcp:/lustrewt /mnt (This fails with below error): mount.lustre: mount 10.0.2.4@tcp:/lustrewt at /mnt failed: Input/output error Is the MGS running? [root@lustre-client-1 opc]# Thanks, Pinkesh Valdria OCI – Big Data Principal Solutions Architect m: +1-206-234-4314 HYPERLINK "mailto:pinkesh.vald...@oracle.com"pinkesh.vald...@oracle.com
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org