[Lustre-discuss] Lustre crashes periodically
Hi everyone, I have a problem lately with our Lustre 1.8 deployment. It crashes periodically in a way that the nodes can mount the storage and I can't access the Lustre server machine neither. So I have to manually restart the machine every time to make everything normal again. I tried to see the logs, memory usage and locks count to see whether these issues may have the cause of the problem. But, I don't think they account for this issue. An interesting symptom I see every time this problem happens is the Infiniband switch network usage lights which blink very fast. I think a huge traffic on the Infiniband network to the lustre server may cause the server crash. Does this relevance seems logical? Anyway, I hope some of you may have experience this problem before and could help me understand what is happening and how to avoid crashing the server again! Thanks,___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre crashes periodically
Sorry, I have to correct this: the nodes CANNOT mount the storage and I can't access the Lustre server machine neither. On Wednesday ۱۷ July ۱۳۹۲ at ۱۱:۲۱, Arya Mazaheri wrote: Hi everyone, I have a problem lately with our Lustre 1.8 deployment. It crashes periodically in a way that the nodes can mount the storage and I can't access the Lustre server machine neither. So I have to manually restart the machine every time to make everything normal again. I tried to see the logs, memory usage and locks count to see whether these issues may have the cause of the problem. But, I don't think they account for this issue. An interesting symptom I see every time this problem happens is the Infiniband switch network usage lights which blink very fast. I think a huge traffic on the Infiniband network to the lustre server may cause the server crash. Does this relevance seems logical? Anyway, I hope some of you may have experience this problem before and could help me understand what is happening and how to avoid crashing the server again! Thanks, ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Kernel Panic error while running lustre 2.0 with infiniband
Hi there, I have configured and ran lustre 2.0 with tcp (OSS and MDS on on the same server) without problem. Now I am trying to run lustre with infiniband support. but whenever I mount the mdt storage on server, the process ends with following error: kernel panic - not syncing: fatal exception my /etc/modprobe.conf is: options lnet networks=o2ib0(ib0) last lines of dmesg: kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sda2, internal journal LDISKFS-fs: recovery complete. LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sda2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Can't query IPoIB interface ib0: it's down LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Skipped 1 previous similar message eth0: no IPv6 routers present LustreError: 105-4: Error -100 starting up LNI o2ib LustreError: Skipped 1 previous similar message LustreError: 6041:0:(events.c:731:ptlrpc_init_portals()) network initialisation failed LustreError: 158-c: Can't load module 'mgs' LustreError: 6035:0:(genops.c:286:class_newdev()) OBD: unknown type: mgs LustreError: 6035:0:(obd_config.c:300:class_attach()) Cannot create device MGS of type mgs : -19 LustreError: 6035:0:(obd_mount.c:502:lustre_start_simple()) MGS attach error -19 LustreError: 15e-a: Failed to start MGS 'MGS' (-19). Is the 'mgs' module loaded? LustreError: 6035:0:(obd_mount.c:1492:server_put_super()) no obd lustre-MDT LustreError: 6035:0:(obd_mount.c:137:server_deregister_mount()) lustre-MDT not registered Lustre: server umount lustre-MDT complete LustreError: 6035:0:(obd_mount.c:2136:lustre_fill_super()) Unable to mount (-19) kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb1, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb1, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LustreError: 6117:0:(events.c:731:ptlrpc_init_portals()) network initialisation failed LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LustreError: 6193:0:(events.c:731:ptlrpc_init_portals()) network initialisation failed LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb3, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb3, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LustreError: 6269:0:(o2iblnd.c:2501:kiblnd_startup()) Can't query IPoIB interface ib0: it's down LustreError: 6269:0:(o2iblnd.c:2501:kiblnd_startup()) Skipped 2 previous similar messages LustreError: 105-4: Error -100 starting up LNI o2ib LustreError: Skipped 2 previous similar messages LustreError: 6269:0:(events.c:731:ptlrpc_init_portals()) network initialisation failed LustreError: 158-c: Can't load module 'mgc' LustreError: Skipped 2 previous similar messages LustreError: 6263:0:(genops.c:286:class_newdev()) OBD: unknown type: mgc LustreError: 6263:0:(genops.c:286:class_newdev()) Skipped 2 previous similar messages LustreError: 6263:0:(obd_config.c:300:class_attach()) Cannot create device MGC0@lo of type mgc : -19 LustreError: 6263:0:(obd_config.c:300:class_attach()) Skipped 2 previous similar messages LustreError: 6263:0:(obd_mount.c:502:lustre_start_simple()) MGC0@lo attach error -19
Re: [Lustre-discuss] Installing Lustre client on 2.6.18-194 kernel
Thanks Albert, I really appreciate you... Now everything is working... On Mon, Feb 21, 2011 at 7:44 PM, Albert Everett aeever...@ualr.edu wrote: Here's what's in our /etc/modprobe.conf related to IB and lustre: options ib_mthca msi_x=1 options lnet networks=o2ib0(ib0) options ko2iblnd ipif_name=ib0 We have Mellanox Infinihost (III?) DDR cards and IPs defined for them. $ /sbin/ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:29:b341/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:1719 errors:0 dropped:0 overruns:0 frame:0 TX packets:30 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:96564 (94.3 KiB) TX bytes:2420 (2.3 KiB) Albert On Feb 20, 2011, at 2:20 PM, Arya Mazaheri wrote: I have done what you said. I will test my client to the server tomorrow. but would you tell me the tweaks you have done on /etc/modprobe.conf ? On Sun, Feb 20, 2011 at 1:56 AM, Albert Everett aeever...@ualr.edu wrote: For lustre client, we did not need to alter our kernel at all. We just made and installed lustre-1.8.5 and lustre-modules-1.8.5 rpms. /etc/modprobe.conf needs a tweak. For lustre server, I believe you will need to deal with a patched kernel. We have not been down this road yet since our vendor includes lustre server software with their hardware. Albert On Feb 19, 2011, at 12:18 PM, Arya Mazaheri wrote: Hi Albert, It seems that you have made a new kernel in order to run lustre on clients. Am I right? I don't want to change kernel on clients at all... On Sat, Feb 19, 2011 at 8:57 PM, Albert Everett aeever...@ualr.edu wrote: Our kernel is also 2.6.18_194.17.4.el5. We installed OFED 1.5.2 from source, following this guide: https://wiki.rocksclusters.org/wiki/index.php/Install_OFED_1.5.x_on_a_Rocks_5.3_cluster ... which left us, among other things, a folder /usr/src/ofa_kernel. Lustre on the server side is handled by our vendor, so all we needed to worry about is the client. To build a lustre client, we then installed lustre-1.8.5.tar.gz from source, not from rpms. Our first compile produced the error you show below. # ./configure --with-linux=/lib/modules/`uname -r`/build # make rpms To get the lustre installation to use our new OFED, we tried this and it worked. # ./configure --with-o2ib=/usr/src/ofa_kernel --with-linux=/lib/modules/`uname -r`/build # make rpms RPMs showed up in /usr/src/redhat/RPMS/x86_64, and we are using lustre-1.8.5*.rpm and lustre-modules-*.rpm on our client machines. Albert On Feb 19, 2011, at 8:34 AM, Arya Mazaheri wrote: Hi, I have installed lustre client packages on a client node. But it doesn't mount the lustre file system from lustre server. It gets the following famous error: $ mount -t lustre 192.168.0.1:/lustre /mnt/lustre mount.lustre: mount 172.16.113.232:/lustre at /mnt/lustre failed: No such device Are the lustre modules loaded? Check /etc/modprobe.conf and /proc/filesystems Note 'alias lustre llite' should be removed from modprobe.conf As I was searching through the mailing list, I have noticed that lustre.ko should be present in this directory: /lib/modules/2.6.18-194.17.4.el5/kernel/fs/lustre/lustre.ko My current kernel is 2.6.18-194.17.4.el5. but lustre.ko is in 2.6.18-164.11.1.el5 instead. So I guessed that this may be the source of problem. Any ideas? Thanks ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Kernel Panic error while running lustre 2.0 with infiniband
yep! you're right... On Mon, Feb 21, 2011 at 11:19 PM, Albert Everett aeever...@ualr.edu wrote: Here's all we needed. Yours is probably similar. # cat /etc/sysconfig/network-scripts/ifcfg-ib0 DEVICE=ib0 IPADDR=192.168.2.1 NETMASK=255.255.255.0 BOOTPROTO=static ONBOOT=yes On Feb 21, 2011, at 1:17 PM, Arya Mazaheri wrote: problem solved. I was trying to set the IP of ib0 by this command: ifconfig ib0 192.168.1.1 netmask 255.255.255.0 up but, it leads to the kernel panic. So I tried to set the IP address by adding to network-scripts. So, it works now... I really don't know why setting IP with ifconfig doesn't work. So weird... On Mon, Feb 21, 2011 at 7:31 PM, Albert Everett aeever...@ualr.edu wrote: What's output of # ifconfig ib0 Albert On Feb 21, 2011, at 6:27 AM, Arya Mazaheri wrote: Hi there, I have configured and ran lustre 2.0 with tcp (OSS and MDS on on the same server) without problem. Now I am trying to run lustre with infiniband support. but whenever I mount the mdt storage on server, the process ends with following error: kernel panic - not syncing: fatal exception my /etc/modprobe.conf is: options lnet networks=o2ib0(ib0) last lines of dmesg: kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sda2, internal journal LDISKFS-fs: recovery complete. LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sda2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Can't query IPoIB interface ib0: it's down LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Skipped 1 previous similar message eth0: no IPv6 routers present LustreError: 105-4: Error -100 starting up LNI o2ib LustreError: Skipped 1 previous similar message LustreError: 6041:0:(events.c:731:ptlrpc_init_portals()) network initialisation failed LustreError: 158-c: Can't load module 'mgs' LustreError: 6035:0:(genops.c:286:class_newdev()) OBD: unknown type: mgs LustreError: 6035:0:(obd_config.c:300:class_attach()) Cannot create device MGS of type mgs : -19 LustreError: 6035:0:(obd_mount.c:502:lustre_start_simple()) MGS attach error -19 LustreError: 15e-a: Failed to start MGS 'MGS' (-19). Is the 'mgs' module loaded? LustreError: 6035:0:(obd_mount.c:1492:server_put_super()) no obd lustre-MDT LustreError: 6035:0:(obd_mount.c:137:server_deregister_mount()) lustre-MDT not registered Lustre: server umount lustre-MDT complete LustreError: 6035:0:(obd_mount.c:2136:lustre_fill_super()) Unable to mount (-19) kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb1, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb1, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LustreError: 6117:0:(events.c:731:ptlrpc_init_portals()) network initialisation failed LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LustreError: 6193:0:(events.c:731:ptlrpc_init_portals()) network initialisation failed LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb3, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended LDISKFS FS on sdb3, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled
Re: [Lustre-discuss] Installing Lustre client on 2.6.18-194 kernel
I have done what you said. I will test my client to the server tomorrow. but would you tell me the tweaks you have done on /etc/modprobe.conf ? On Sun, Feb 20, 2011 at 1:56 AM, Albert Everett aeever...@ualr.edu wrote: For lustre client, we did not need to alter our kernel at all. We just made and installed lustre-1.8.5 and lustre-modules-1.8.5 rpms. /etc/modprobe.conf needs a tweak. For lustre server, I believe you will need to deal with a patched kernel. We have not been down this road yet since our vendor includes lustre server software with their hardware. Albert On Feb 19, 2011, at 12:18 PM, Arya Mazaheri wrote: Hi Albert, It seems that you have made a new kernel in order to run lustre on clients. Am I right? I don't want to change kernel on clients at all... On Sat, Feb 19, 2011 at 8:57 PM, Albert Everett aeever...@ualr.edu wrote: Our kernel is also 2.6.18_194.17.4.el5. We installed OFED 1.5.2 from source, following this guide: https://wiki.rocksclusters.org/wiki/index.php/Install_OFED_1.5.x_on_a_Rocks_5.3_cluster ... which left us, among other things, a folder /usr/src/ofa_kernel. Lustre on the server side is handled by our vendor, so all we needed to worry about is the client. To build a lustre client, we then installed lustre-1.8.5.tar.gz from source, not from rpms. Our first compile produced the error you show below. # ./configure --with-linux=/lib/modules/`uname -r`/build # make rpms To get the lustre installation to use our new OFED, we tried this and it worked. # ./configure --with-o2ib=/usr/src/ofa_kernel --with-linux=/lib/modules/`uname -r`/build # make rpms RPMs showed up in /usr/src/redhat/RPMS/x86_64, and we are using lustre-1.8.5*.rpm and lustre-modules-*.rpm on our client machines. Albert On Feb 19, 2011, at 8:34 AM, Arya Mazaheri wrote: Hi, I have installed lustre client packages on a client node. But it doesn't mount the lustre file system from lustre server. It gets the following famous error: $ mount -t lustre 192.168.0.1:/lustre /mnt/lustre mount.lustre: mount 172.16.113.232:/lustre at /mnt/lustre failed: No such device Are the lustre modules loaded? Check /etc/modprobe.conf and /proc/filesystems Note 'alias lustre llite' should be removed from modprobe.conf As I was searching through the mailing list, I have noticed that lustre.ko should be present in this directory: /lib/modules/2.6.18-194.17.4.el5/kernel/fs/lustre/lustre.ko My current kernel is 2.6.18-194.17.4.el5. but lustre.ko is in 2.6.18-164.11.1.el5 instead. So I guessed that this may be the source of problem. Any ideas? Thanks ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Running MGS and OSS on the same machine
Yep, I have fixed it with this command instead: mkfs.lustre --fsname lustre --ost --mgsnode=0@lo /dev/sdb1 2011/2/18 Charland, Denis denis.charl...@imi.cnrc-nrc.gc.ca Arya, I have the MGS, the MDT and the OST all on the same machine and everything works fine. It should not be a problem to have the MGS and the OST on the same machine. Are your MGS and MDT mounted when you execute mkfs.lustre for the OST? Denis *Denis Charland, ing. **| P. Eng.*** Administrateur de Systèmes UNIX | UNIX Systems Administrator Tél. | tel. (450) 641-5078 Fax (450) 641-5106 Courriel | E-mail : denis.charl...@cnrc-nrc.gc.ca Institut des matériaux industriels | Industrial Materials Institute Conseil national de recherches Canada | National Research Council Canada 75, de Mortagne, Boucherville, Québec, Canada, J4B 6Y4 Gouvernement du Canada | Government of Canada ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Running MGS and OSS on the same machine
Thanks Michael It worked with 0@lo. thanks again for your suggestion... On Fri, Feb 18, 2011 at 6:55 PM, Michael Kluge michael.kl...@tu-dresden.dewrote: Hi Arya, if I remember well, Lustre uses 0@lo for the localhost address. Does using the other NID 192.168.0.10@tcp0 give any error message? Michael Am 18.02.2011 16:10, schrieb Arya Mazaheri: Hi again, I have planned to use one server as MGS and OSS simultaneously. But how can I format the OSTs as lustre FS? for example, the line below tells the ost which it's mgsnode is at 192.168.0.10@tcp0: mkfs.lustre --fsname lustre --ost --mgsnode=192.168.0.10@tcp0/dev/vg00/ost1 But, now mgsnode is the same machine. I tried to put localhost instead the ip address. but I didn't work. What shoud I do? Arya ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Michael Kluge, M.Sc. Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax:(+49) 351 463-37773 e-mail: michael.kl...@tu-dresden.de WWW:http://www.tu-dresden.de/zih ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Installing Lustre client on 2.6.18-194 kernel
Well, I want to install client module on rocks 5.5 x86_64. there are some packages in source section of lustre download area. I am confused which one to choose? what are the differences between them? lustre-client-source-2.0.0.1-2.6.18_164.11.1.0.1.el5_lustre.2.0.0.1.i686.rpm lustre-client-source-2.0.0.1-2.6.16_60_0.42.8_lustre.2.0.0.1_smp.x86_64.rpm lustre-client-source-2.0.0.1-2.6.27_23_0.1_lustre.2.0.0.1_default.x86_64.rpm lustre-client-source-2.0.0.1-2.6.18_164.11.1.0.1.el5_lustre.2.0.0.1.x86_64.rpm lustre-client-source-2.0.0.1-2.6.16_60_0.42.8_lustre.2.0.0.1_bigsmp.i686.rpm lustre-client-source-2.0.0.1-2.6.27_23_0.1_lustre.2.0.0.1_default.i686.rpm lustre-client-source-2.0.0.1-2.6.18_164.11.1.el5_lustre.2.0.0.1.i686.rpm lustre-client-source-2.0.0.1-2.6.18_164.11.1.el5_lustre.2.0.0.1.x86_64.rpm and one other thing! where is the source for lustre-client-modules? On Sat, Feb 19, 2011 at 6:08 PM, Brian J. Murrell br...@whamcloud.comwrote: On 11-02-19 09:34 AM, Arya Mazaheri wrote: As I was searching through the mailing list, I have noticed that lustre.ko should be present in this directory: /lib/modules/2.6.18-194.17.4.el5/kernel/fs/lustre/lustre.ko My current kernel is 2.6.18-194.17.4.el5. but lustre.ko is in 2.6.18-164.11.1.el5 instead. You need the client modules package that matches your kernel. If one is not available you will have to build it from the source. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Installing Lustre client on 2.6.18-194 kernel
Hi Albert, It seems that you have made a new kernel in order to run lustre on clients. Am I right? I don't want to change kernel on clients at all... On Sat, Feb 19, 2011 at 8:57 PM, Albert Everett aeever...@ualr.edu wrote: Our kernel is also 2.6.18_194.17.4.el5. We installed OFED 1.5.2 from source, following this guide: https://wiki.rocksclusters.org/wiki/index.php/Install_OFED_1.5.x_on_a_Rocks_5.3_cluster ... which left us, among other things, a folder /usr/src/ofa_kernel. Lustre on the server side is handled by our vendor, so all we needed to worry about is the client. To build a lustre client, we then installed lustre-1.8.5.tar.gz from source, not from rpms. Our first compile produced the error you show below. # ./configure --with-linux=/lib/modules/`uname -r`/build # make rpms To get the lustre installation to use our new OFED, we tried this and it worked. # ./configure --with-o2ib=/usr/src/ofa_kernel --with-linux=/lib/modules/`uname -r`/build # make rpms RPMs showed up in /usr/src/redhat/RPMS/x86_64, and we are using lustre-1.8.5*.rpm and lustre-modules-*.rpm on our client machines. Albert On Feb 19, 2011, at 8:34 AM, Arya Mazaheri wrote: Hi, I have installed lustre client packages on a client node. But it doesn't mount the lustre file system from lustre server. It gets the following famous error: $ mount -t lustre 192.168.0.1:/lustre /mnt/lustre mount.lustre: mount 172.16.113.232:/lustre at /mnt/lustre failed: No such device Are the lustre modules loaded? Check /etc/modprobe.conf and /proc/filesystems Note 'alias lustre llite' should be removed from modprobe.conf As I was searching through the mailing list, I have noticed that lustre.ko should be present in this directory: /lib/modules/2.6.18-194.17.4.el5/kernel/fs/lustre/lustre.ko My current kernel is 2.6.18-194.17.4.el5. but lustre.ko is in 2.6.18-164.11.1.el5 instead. So I guessed that this may be the source of problem. Any ideas? Thanks ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Kernel Panic error after lustre 2.0 installation
Ww! Thanks for your suggestion. The only thing needed to do is to make the 'arcmsr.c' and 'arcmsr.h' and finally make the ram disk. Now everything is working smoothly... Thanks again... ;) On Fri, Feb 18, 2011 at 1:16 AM, Kevin Van Maren kevin.van.ma...@oracle.com wrote: Yep. All you have to do is rebuild the driver for the Lustre kernel. First, bring the system back up with the non-Lustre kernel. See the bottom of the readme: # cd /usr/src/linux/drivers/scsi/arcmsr (suppose /usr/src/linux is the soft-link for /usr/src/kernel/2.6.23.1-42.fc8-i386) # make -C /lib/modules/`uname -r`/build CONFIG_SCSI_ARCMSR=m SUBDIRS=$PWD modules # insmod arcmsr.ko Except instead of uname -r substitute the lustre kernel's 'uname -r', as you want to build for the Lustre kernel. Be sure you have the Lustre kernel-devel RPM installed. Note that the insmod will not work (you already have it for the running kernel, and the one you built for the Lustre kernel will not work). You will need to rebuild the initrd for the Lustre kernel (see the other instructions in the readme, using the Lustre kernel). Kevin Arya Mazaheri wrote: The driver name is arcmsr.ko and I extracted it from driver.img included in RAID controller's CD. The following text file may clarify better: ftp://areca.starline.de/RaidCards/AP_Drivers/Linux/DRIVER/RedHat/FedoraCore/Redhat-Fedora-core8/1.20.0X.15/Intel/readme.txt Please tell me, if you need more information about this issue... On Thu, Feb 17, 2011 at 11:33 PM, Brian J. Murrell br...@whamcloud.commailto: br...@whamcloud.com wrote: On Thu, 2011-02-17 at 23:26 +0330, Arya Mazaheri wrote: Hi there, Hi, Unable to access resume device (LABEL=SWAP-sda3) mount: could not find filesystem 'dev/root' setuproot: moving /dev failed: No such file or directory setuproot: error mounting /proc: No such file or directory setuproot: error mounting /sys: No such file or directory swirchroot: mount failed: No such file or directory Kernel Panic - not syncing: Attempted to kill init! I have no problem with the original kernel installed by centos. I guessed this may be related to RAID controller card driver which may not loaded by the patched lustre kernel. That seems like a reasonable conclusion given the information available. so I have added the driver into the initrd.img file. Where did you get the driver from? What is the name of the driver? But it didn't solve the problem. Depending on where it came from, yes, it might not. Should I install the lustre by building the source? That may be required, but not necessarily required. We need more information. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org mailto:Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Running MGS and OSS on the same machine
Hi again, I have planned to use one server as MGS and OSS simultaneously. But how can I format the OSTs as lustre FS? for example, the line below tells the ost which it's mgsnode is at 192.168.0.10@tcp0: mkfs.lustre --fsname lustre --ost --mgsnode=192.168.0.10@tcp0 /dev/vg00/ost1 But, now mgsnode is the same machine. I tried to put localhost instead the ip address. but I didn't work. What shoud I do? Arya ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Kernel Panic error after lustre 2.0 installation
Hi there, I have got an error after installing lustre 2.0 on the MGS server with RAID controller card. The server OS is centOS 5.4 x86_64 and has 1.2TB storage which has configured by RAID 1+0. After installing lustre rpm packages and rebooting machine, I face with the errors below at linux startup: Unable to access resume device (LABEL=SWAP-sda3) mount: could not find filesystem 'dev/root' setuproot: moving /dev failed: No such file or directory setuproot: error mounting /proc: No such file or directory setuproot: error mounting /sys: No such file or directory swirchroot: mount failed: No such file or directory Kernel Panic - not syncing: Attempted to kill init! I have no problem with the original kernel installed by centos. I guessed this may be related to RAID controller card driver which may not loaded by the patched lustre kernel. so I have added the driver into the initrd.img file. But it didn't solve the problem. Should I install the lustre by building the source? Or any other clue to this problem? Thanks in advance... ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Kernel Panic error after lustre 2.0 installation
The driver name is arcmsr.ko and I extracted it from driver.img included in RAID controller's CD. The following text file may clarify better: ftp://areca.starline.de/RaidCards/AP_Drivers/Linux/DRIVER/RedHat/FedoraCore/Redhat-Fedora-core8/1.20.0X.15/Intel/readme.txt Please tell me, if you need more information about this issue... On Thu, Feb 17, 2011 at 11:33 PM, Brian J. Murrell br...@whamcloud.comwrote: On Thu, 2011-02-17 at 23:26 +0330, Arya Mazaheri wrote: Hi there, Hi, Unable to access resume device (LABEL=SWAP-sda3) mount: could not find filesystem 'dev/root' setuproot: moving /dev failed: No such file or directory setuproot: error mounting /proc: No such file or directory setuproot: error mounting /sys: No such file or directory swirchroot: mount failed: No such file or directory Kernel Panic - not syncing: Attempted to kill init! I have no problem with the original kernel installed by centos. I guessed this may be related to RAID controller card driver which may not loaded by the patched lustre kernel. That seems like a reasonable conclusion given the information available. so I have added the driver into the initrd.img file. Where did you get the driver from? What is the name of the driver? But it didn't solve the problem. Depending on where it came from, yes, it might not. Should I install the lustre by building the source? That may be required, but not necessarily required. We need more information. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss signature.asc Description: PGP signature ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss