Dear All, To follow up my previous problems on Lustre-2.10.5. After spending more days of test, finally I have solved it.
The solution is that my Linux kernel configuration should set: CONFIG_DEBUG_FS=y when compiling kernel (version 3.12.72). Without enabling this option, Lustre-2.9 can work, but Lustre-2.10.5 cannot. Without enabling this option, in Lustre-2.10.5 running "modprobe lustre" it returns the error message: ERROR: could not insert 'lustre': No such device and dmesg just records: [191843.804416] LNet: HW NUMA nodes: 2, HW CPU cores: 24, npartitions: 2 [191844.582597] Lustre: Lustre: Build Version: 2.10.5 I think Lustre-2.11.0 also has the same situation, since two weeks ago I tried it, and I encountered exactly the same problem. For now I am targeting Lustre-2.10.5, so I haven't gone back to try Lustre-2.11.0 again. Sorry that in my previous Emails I have made wrong speculation about my problem. It is nothing related to the version of kmod or udev. I should apologize for making wrong statements before more careful investigation. Best Regards, T.H.Hsieh On Tue, Sep 25, 2018 at 06:00:24PM +0800, Tung-Han Hsieh wrote: > Hello, > > I just made another test. On my newer machine, I rebooted it with > older kernel 3.12.72, and try to recompile Lustre again. So the > system now becomes: > > Linux OS Debian 9.5, with kmod version 23-2, udev version 232-25+deb9u4, > linux kernel 3.12.72, gcc-4.9.2. > > Then I compile Lustre-2.10.5 with > > ./configure --prefix=/opt/lustre \ > --with-linux=/path/to/linux-3.12.72 \ > --disable-server > > This time I don't need to modify Lustre source code at all. And I can > successfully run "modprobe lustre". > > So probably the error I encountered in the older system was due to kmod > or udev version. > > Could anyone confirm my speculation ? > > Thanks very much. > > > T.H.Hsieh > > > On Tue, Sep 25, 2018 at 05:33:01PM +0800, Tung-Han Hsieh wrote: > > Dear Andreas, > > > > Thank you very much for your kindly reply. > > > > When I run "modprobe lustre", dmesg only tells: > > > > [191843.804416] LNet: HW NUMA nodes: 2, HW CPU cores: 24, npartitions: 2 > > [191844.582597] Lustre: Lustre: Build Version: 2.10.0 > > > > and I got "ERROR: could not insert 'lustre': No such device" > > command line message. If I check "lsmod", I saw the following > > lustre modules loaded: > > > > Module Size Used by > > lnet 388690 0 > > libcfs 214791 1 lnet > > > > When I run "modprobe obdclass", the result is exactly the same. > > > > I also tried to recompile Lustre-2.10.5 with the options: > > > > ./configure --prefix=/opt/lustre \ > > --with-linux=/path/to/linux-3.12.72 \ > > --disable-server > > > > to make the situation simpler. But I still get exactly the same error. > > > > BTW., my Linux OS is Debian 8.10, with kmod version 18-3, udev > > version 215-17+deb8u7, linux kernel 3.12.72, gcc-4.9.2. > > > > ========================================================================== > > > > Then I am wondering that whether this error is due to the version of > > Linux OS ? So I tried to compile Lustre-2.10.5 again with the option: > > > > ./configure --prefix=/opt/lustre \ > > --with-linux=/path/to/linux-4.9.110 \ > > --disable-server > > > > on a newer machine: Linux OS Debian 9.5, with kmod version 23-2, > > udev version 232-25+deb9u4, linux kernel 4.9.110, gcc-4.9.2. I need > > to comment out a few lines like: > > > > .setxattr = ll_setxattr, > > .getxattr = ll_getxattr, > > .listxattr = ll_listxattr, > > .removexattr = ll_removexattr, > > > > in "lustre/llite/symlink.c", "lustre/llite/namei.c", and > > "lustre/llite/file.c" in order to successfully build the lustre source > > code. This time I can successfully run: > > > > modprobe lustre > > > > So, does it due to my Linux system (or utilities) too old ? Is there > > a list of "System Requirements" to run Lustre-2.10.5 ? > > > > ps. I suggest that the "System Requirements" should be documented in > > the release note of the Lustre software. Actually, everytime when > > I want to upgrade Lustre system in my clusters, I always have to > > spend a lot of time to *guess* the correct version combination of > > the system, the 3rd party libraries (e.g., ZFS), and Lustre itself, > > ...., > > etc to make everything work. Unfortunately all these information > > are not always easy to find. > > > > > > Best Regards, > > > > T.H.Hsieh > > > > > > > > On Tue, Sep 25, 2018 at 07:38:00AM +0000, Andreas Dilger wrote: > > > What does dmesg tell you? Normally it will report some module has > > > incorrect symbols, which means you compiled against a different version > > > of the kernel source. OFED/MOFED libraries, etc. > > > > > > > On Sep 25, 2018, at 05:14, Tung-Han Hsieh > > > > <thhs...@twcp1.phys.ntu.edu.tw> wrote: > > > > > > > > Dear All, > > > > > > > > I found that my lustre-2.10.5 with ZFS (either 0.7.9 or 0.7.11) > > > > cannot load the "lustre" modules because it cannot load the > > > > "obdclass.ko" module. The error message is the following: > > > > > > > > # modprobe -v -v obdclass > > > > insmod /lib/modules/3.12.72/updates/fs/lustre/obdclass.ko > > > > libkmod: INFO ../libkmod/libkmod-module.c:829 > > > > kmod_module_insert_module: Failed to insert module > > > > '/lib/modules/3.12.72/updates/fs/lustre/obdclass.ko': No such device > > > > ERROR: could not insert 'obdclass': No such device > > > > libkmod: INFO ../libkmod/libkmod.c:319 kmod_unref: context > > > > 0x7fb945d321e0 released > > > > > > > > Could anyone suggest how to debug ? > > > > > > > > Thanks very much. > > > > > > > > > > > > T.H.Hsieh > > > > > > > > > > > > On Tue, Sep 25, 2018 at 12:14:00AM +0800, Tung-Han Hsieh wrote: > > > >> Dear Nathaniel, > > > >> > > > >> Thank you very much for your kindly reply. Indeed I modified the > > > >> lustre-2.10.5 codes: > > > >> > > > >> lustre/osd-zfs/osd_object.c > > > >> lustre/osd-zfs/osd_xattr.c > > > >> > > > >> for the declaration: > > > >> > > > >> inode_timespec_t now; > > > >> > > > >> Similar to what you have done in your patch. So I can compile > > > >> lustre-2.10.5 cleanly with zfs-0.7.11. Sorry I forgot to mention. > > > >> > > > >> But my problem is still there. Actually I just tried: > > > >> > > > >> 1. Applying your patch to the original lustre-2.10.5 code, and > > > >> recompile with spl-0.7.11 and zfs-0.7.11. But loading "lustre" > > > >> module still gives "no such device" error. > > > >> > > > >> 2. I recompile the original lustre-2.10.5 with spl-0.7.9 and > > > >> zfs-0.7.9. They can be compiled cleanly. But again I got the > > > >> "no such device" error when loading "lustre" module. > > > >> > > > >> I am wondering that I must overlooked a trivial step, something > > > >> like one (or some) of the utilities in /opt/lustre/sbin/* should > > > >> be linked to /sbin/ or /usr/sbin/ .... > > > >> > > > >> Any suggestions are very appreciated. > > > >> > > > >> Thank you very much. > > > >> > > > >> > > > >> T.H.Hsieh > > > >> > > > >> > > > >> On Mon, Sep 24, 2018 at 01:21:19PM +0000, Nathaniel Clark wrote: > > > >>> Hello Tung-Han, > > > >>> > > > >>> ZFS 0.7.11 doesn’t compile cleanly with Lustre, yet. > > > >>> > > > >>> There’s a ticket for adding ZFS 0.7.11 support to lustre: > > > >>> https://jira.whamcloud.com/browse/LU-11393 > > > >>> > > > >>> It has patches for master (pre-2.12) and a separate patch for 2.10. > > > >>> > > > >>> — > > > >>> Nathaniel Clark <ncl...@whamcloud.com<mailto:ncl...@whamcloud.com>> > > > >>> Senior Engineer > > > >>> Whamcloud / DDN > > > >>> > > > >>> On Sep 24, 2018, at 2:15 PM, Tung-Han Hsieh > > > >>> <thhs...@twcp1.phys.ntu.edu.tw<mailto:thhs...@twcp1.phys.ntu.edu.tw>> > > > >>> wrote: > > > >>> > > > >>> Dear All, > > > >>> > > > >>> I am trying to install Lustre version 2.10.5 with ZFS-0.7.11 > > > >>> from source code. After compilation and installation, I tried > > > >>> to load the "lustre" module, but encountered the following > > > >>> error: > > > >>> > > > >>> # modprobe lustre > > > >>> could not load module 'lustre': no such device > > > >>> > > > >>> My procedure of installation is the following: > > > >>> > > > >>> 1. Compile vanilla kernel 3.12.72 downloaded from: > > > >>> > > > >>> https://mirrors.edge.kernel.org/pub/linux/kernel/v3.x/linux-3.12.72.tar.gz > > > >>> > > > >>> 2. Compile spl-0.7.11 downloaded from: > > > >>> > > > >>> https://github.com/zfsonlinux/zfs/releases/download/zfs-0.7.11/spl-0.7.11.tar.gz > > > >>> > > > >>> with the following steps: > > > >>> # ./configure --prefix=/opt/lustre > > > >>> --with-linux=/path/to/linux-3.12.72 > > > >>> # make > > > >>> # make install > > > >>> > > > >>> 3. Compile zfs-0.7.11 downloaded from: > > > >>> > > > >>> https://github.com/zfsonlinux/zfs/releases/download/zfs-0.7.11/zfs-0.7.11.tar.gz > > > >>> > > > >>> with the following steps: > > > >>> # ./configure --prefix=/opt/lustre \ > > > >>> --with-linux=/path/to/linux-3.12.72 \ > > > >>> --with-spl=/path/to/spl-0.7.11 > > > >>> # make > > > >>> # make install > > > >>> > > > >>> 4. Compile lustre downloaded from: > > > >>> > > > >>> https://downloads.whamcloud.com/public/lustre/lustre-2.10.5/sles12sp3/client/SRPMS/lustre-2.10.5-1.src.rpm > > > >>> > > > >>> Then I unpack the SRPM by the command: > > > >>> # rpm2cpio lustre-2.10.5-1.src.rpm | cpio --extract > > > >>> --make-directories > > > >>> > > > >>> and compile it by the following: > > > >>> # ./configure --prefix=/opt/lustre \ > > > >>> --with-linux=/path/to/linux-3.12.72 \ > > > >>> --with-spl=/path/to/spl-0.7.11 \ > > > >>> --with-zfs=/path/to/zfs-0.7.11 \ > > > >>> --with-o2ib=no \ > > > >>> --disable-ldiskfs > > > >>> # make > > > >>> # make install > > > >>> > > > >>> 5. I have made sure the following settings and utilities are correct: > > > >>> - PATH contains /opt/lustre/bin and /opt/lustre/sbin > > > >>> - /sbin/mount.lustre exists. > > > >>> - /sbin/mount.zfs exists. > > > >>> - /usr/sbin/l_getidentity exists. > > > >>> - /usr/sbin/ko2iblnd-probe exists. > > > >>> - /etc/modprobe.d/lustre.conf contains: > > > >>> options lnet networks=tcp > > > >>> - /etc/modprobe.d/ko2iblnd.conf contains: > > > >>> alias ko2iblnd-opa ko2iblnd > > > >>> options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 > > > >>> credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 > > > >>> fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 > > > >>> install ko2iblnd /usr/sbin/ko2iblnd-probe > > > >>> > > > >>> Then I tried to run "modprobe lustre", it says "no such device" error. > > > >>> > > > >>> I tried to replace Lustre-2.10.5 by Lustre-2.9 downloaded from: > > > >>> > > > >>> https://downloads.whamcloud.com/public/lustre/lustre-2.9.0/sles12sp1/client/SRPMS/lustre-2.9.0-1.src.rpm > > > >>> > > > >>> and proceed exactly the same installation steps. Everything works > > > >>> fine. > > > >>> > > > >>> Could anyone suggest me what have I missed for lustre-2.10.5 ? Or > > > >>> suggest > > > >>> me how to debug. > > > >>> > > > >>> Thanks very much. > > > >>> > > > >>> > > > >>> T.H.Hsieh > > > >>> _______________________________________________ > > > >>> lustre-discuss mailing list > > > >>> lustre-discuss@lists.lustre.org > > > >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > >>> > > > >> _______________________________________________ > > > >> lustre-discuss mailing list > > > >> lustre-discuss@lists.lustre.org > > > >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > > _______________________________________________ > > > > lustre-discuss mailing list > > > > lustre-discuss@lists.lustre.org > > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > > > > Cheers, Andreas > > > --- > > > Andreas Dilger > > > CTO Whamcloud > > > > > > > > > > > > > > > > _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org