Hi Aayush, i used the same command but i had to first generate a new iso of the ofed install before it worked ( i cant remember why this was the case). if you haven't already you can view the details here http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_1_5_3-3_0_0.txt . I ran into this issue about 1.5 years ago so i hope i'm not forgetting anything. Mike
On Wed, Oct 1, 2014 at 12:50 AM, aayush agrawal < [email protected]> wrote: > Hi Mike, > > While installing OFED I have used below command: > # ./mlnxofedinstall -vvv --add-kernel-support --without-32bit > --without-fw-update --hpc > > I have used option --add-kernel-support, Which add kernel support (Run > mlnx_add_kernel_support.sh). This is what you meant to say, right? > > Thanks, > Aayush. > > > On 9/30/2014 11:04 PM, Mike Ware wrote: > > I knew I had it somewhere > > > http://lists.lustre.org/pipermail/lustre-discuss/2012-November/016988.html > > > Mike > > On Tue, Sep 30, 2014 at 10:32 AM, Mike Ware <[email protected]> > wrote: > >> I had a similar issue using the Mellanox packages. If i remember >> correctly I had to recompile the drivers against the Lustre kernel for the >> install. I believe Mellanox had an article on this but I don't have the >> link. >> >> Mike >> >> On Tue, Sep 30, 2014 at 8:07 AM, Parinay Kondekar < >> [email protected]> wrote: >> >>> IMO you should try out strace to see if anything is noticed. >>> "Write failed: Broken pipe" is quite common message and difficult to >>> conclude anything with. >>> >>> Regards >>> parinay >>> >>> On Tue, Sep 30, 2014 at 8:16 PM, aayush agrawal < >>> [email protected]> wrote: >>> >>>> Hi Parinay, >>>> >>>> Yes, I see ib0 in output of ifconfig -a. >>>> I also tried with options lnet networks=*o2ib0*(ib0) but no luck. >>>> While loading lnet I do see error in var/log/messages: >>>> >>>> kernel: LNet: HW CPU cores: 32, npartitions: 4 >>>> alg: No test for crc32 (crc32-table) >>>> kernel: alg: No test for adler32 (adler32-zlib) >>>> kernel: alg: No test for crc32 (crc32-pclmul) >>>> kernel: padlock: VIA PadLock Hash Engine not detected. >>>> modprobe: FATAL: Error inserting padlock_sha >>>> (/lib/modules/2.6.32_358/kernel/drivers/crypto/padlock-sha.ko): No such >>>> device >>>> >>>> But as per below link this should not be a problem? >>>> https://jira.hpdd.intel.com/browse/LU-1599 >>>> >>>> modprobe lnet completes successfully and I see "Write failed: Broken >>>> pipe" after running "lctl network up" and after this session gets logout >>>> from the server. >>>> >>>> Thanks, >>>> Aayush. >>>> >>>> >>>> On 9/30/2014 7:21 PM, Parinay Kondekar wrote: >>>> >>>> - what is the output of 'ifconfig -a' , do you see ib0 there ? >>>> mentioning 'options lnet networks=*o2ib0*(ib0)' should be enough. >>>> - anything in syslog ? >>>> >>>> HTH >>>> >>>> On Tue, Sep 30, 2014 at 6:03 PM, aayush agrawal < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to build lustre 2.5.0 against >>>>> MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 on CentOS6.4 with kernel version >>>>> 2.6.32-358. >>>>> But I am not able to set lnet config settings properly. I used >>>>> settings suggested in lustre 2.x manual. But then not able to get network >>>>> up using lctl. >>>>> >>>>> Details: >>>>> >>>>> I have two server machines, one for mgs+mdt and second for oss and one >>>>> client machine. I want to setup Infiniband on all these machines. >>>>> I could run below steps successfully for all the three machines: >>>>> 1. Run script mlnxofedinstall >>>>> # ./mlnxofedinstall -vvv --add-kernel-support --without-32bit >>>>> --without-fw-update --hpc >>>>> 2. Restart openibd service >>>>> # /etc/init.d/openibd restart >>>>> 3. configure ib0 interface. >>>>> 4. configure lustre with o2ib >>>>> # ./configure --with-linux=Path_to_linux-2.6.32-358.18.1.el6 >>>>> --with-o2ib=/usr/src/ofa_kernel/default/ >>>>> >>>>> 5. make lustre rpms: >>>>> # make rpms >>>>> This gave me below compilation error >>>>> I looked online for this error and found bug registered on the same: >>>>> https://jira.hpdd.intel.com/browse/LU-4266 >>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.hpdd.intel.com_browse_LU-2D4266&d=AAMCAw&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=Gu0enSN8vm3fdyqEtx0cJjPMhWf9o_TCXmJhHez9HKE&e=> >>>>> Below patch from above link solved the problem and hence I could build >>>>> lustre rpms: >>>>> http://review.whamcloud.com/#/c/8451/1 >>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__review.whamcloud.com_-23_c_8451_1&d=AAMCAw&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=BqWJdkdWSRVMHWQkLWAhYaV0yfRwJZDUb61TfAgRss0&e=> >>>>> >>>>> Now first I want to do the Infiniband setup for mgs and mdt on single >>>>> machine which also has Ethernet IP. Then I want to format and mount mgs >>>>> and >>>>> mdt. >>>>> So I installed above created lustre rpms and then added below line in >>>>> /etc/modprobe.d/lustre.conf >>>>> options lnet networks=o2ib(ib0) >>>>> >>>>> Then I rebooted the machine to remove all lustre related modules >>>>> including lnet and then ran modprobe lnet command to add above >>>>> parameters and the ran lctl network up which is giving me below error: >>>>> LNET configure error 100: Network is down >>>>> >>>>> I looked online and found below discussion on same error: >>>>> http://lists.lustre.org/pipermail/lustre-discuss/2010-June/013510.html >>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_pipermail_lustre-2Ddiscuss_2010-2DJune_013510.html&d=AAMCAw&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=aCgXfqCUyJ7IXVRJHjqpk2HCS1_dsKDuaKJrDPmWp4I&e=> >>>>> >>>>> As per suggestion in above mail I tried with below line in >>>>> /etc/modprobe.d/lustre.conf. In below command for IB_IP, I have >>>>> given infiniband IP. >>>>> options lnet *networks=o2ib(ib0)* routes="tcp0 IB_IP@o2ib" >>>>> This command hangs for around 2 to 3 minutes and then gives error: >>>>> Write failed: Broken pipe. Same is the case for "options lnet >>>>> *networks=o2ib(ib0)*" >>>>> But if I set: options lnet *networks=tcp0(eth0),o2ib(ib0)* >>>>> routes="tcp1 IB_IP@o2ib" then it gives LNET configure error 100: >>>>> Network is down. >>>>> >>>>> It seems that for network=o2ib(ibo) I am getting error Write failed: >>>>> Broken pipe. >>>>> Am I missing anything while following above steps? Or how do I resolve >>>>> above error? >>>>> >>>>> Thanks, >>>>> Aayush. >>>>> >>>>> <html> >>>>> _______________________________________________ >>>>> HPDD-discuss mailing list >>>>> [email protected] >>>>> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.01.org_mailman_listinfo_hpdd-2Ddiscuss&d=AAICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=c-1Cg_VH2lcYI_JXS3gypPA6xWmYsO2Md6-EoqjeIzk&m=q_uNuYFdGrDiFyB8x0KjRuPV4TbYGJf20PKQKambrfE&s=0hW3r7x0NhgbZ7zgaZKr9K_fk7_E8bs0f-GAlH89rgM&e= >>>>> >>>>> >>>> >>>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> > >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
