Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3
Moshe Kazir wrote: > Hi Tziporet, > > I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64. > > Install is stopped at the very beginning as 64-bit udev is missing. > > I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed > as result of compilation error. > > Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit > udev ? > > We have here only one MAC PPC64 machine that can run only Fedora C4 thus this is the only system we check. Maybe Vlad can help but I think best if you approach Novel (Mois is their contact for OFED) Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3
Hi Tziporet, I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64. Install is stopped at the very beginning as 64-bit udev is missing. I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed as result of compilation error. Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit udev ? Moshe Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tziporet Koren Sent: Tuesday, August 29, 2006 5:50 PM To: OPENIB Subject: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 Hi All, In testing today we found that on SLES9 SP3 memory locking as a regular user fails. Although I changed /etc/security/limits.conf and added the following two lines: * soft memlock * hard memlock Note that same change does work in SLES10. Another change I tried (that worked in gen1) was to add the following line to the file/etc/sysctl.conf: vm.disable_cap_mlock=1. However nothing helped in SLES9 Does anyone have any idea how to solve this? Thanks, Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar
Dhabaleswar Panda wrote: > Christian - Thanks for sending instructions for running mvapich2-0.9.5 > to Tziporet. > > Tziporet - Thanks for looking into this problem on SLES9 environment. > > Please note that a detailed user guide for running and tuning MVAPICH2 > 0.9.5 is available from the following URL: > > http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html > > DK > Thanks to all, We found the bug that was in memory registration flow of SLES9 only. A fix will be available in OFED 1.1 RC4 Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar
Christian - Thanks for sending instructions for running mvapich2-0.9.5 to Tziporet. Tziporet - Thanks for looking into this problem on SLES9 environment. Please note that a detailed user guide for running and tuning MVAPICH2 0.9.5 is available from the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html DK > Hi Tziporet, > On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote: > > Can you explain me how to run mvapich2-0.9.5? > > at first, simple compiling using the OSU scripts (make.mvapich2.gen2) - > should work out of the box. (except you will use PCI-X HCAs - you'll > have to ommit "-DSRQ" in the build script then). Note, python-devel is > needed for the build. > > then, assuming your doing your tests as root on a single box. > > - create /etc/mpd.conf > > containing the line "secretword=blabla" - just some non-meaningful > passphrase ;) > (you'll probably also need the same file as ~/.mpd.conf and > ~/.mpdpasswd , too) > > - start mpd ring > # mpdboot -n 1 -f hosts > (hosts should contain the hostname) > > - check if mpdring is up and running > # mpdtrace > > - start application on 2 CPUs > # mpiexec -n 2 ./a.out > > - once tests are over, stop the ring > # mpdallexit > > hope that helps, > > cheers. > - Christian ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3
Hi Tziporet, On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote: > Can you explain me how to run mvapich2-0.9.5? at first, simple compiling using the OSU scripts (make.mvapich2.gen2) - should work out of the box. (except you will use PCI-X HCAs - you'll have to ommit "-DSRQ" in the build script then). Note, python-devel is needed for the build. then, assuming your doing your tests as root on a single box. - create /etc/mpd.conf containing the line "secretword=blabla" - just some non-meaningful passphrase ;) (you'll probably also need the same file as ~/.mpd.conf and ~/.mpdpasswd , too) - start mpd ring # mpdboot -n 1 -f hosts (hosts should contain the hostname) - check if mpdring is up and running # mpdtrace - start application on 2 CPUs # mpiexec -n 2 ./a.out - once tests are over, stop the ring # mpdallexit hope that helps, cheers. - Christian smime.p7s Description: S/MIME cryptographic signature ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3
Can you explain me how to run mvapich2-0.9.5? Thanks, Tziporet -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christian Guggenberger Sent: Monday, September 04, 2006 6:25 PM To: Tziporet Koren Cc: Eli Cohen; Christian Guggenberger; OPENIB Subject: Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3 > >>We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. > >>We tried to run here the test you attached on mvapich1 but have not seen > >>this failure. > >>Can you try to reproduce with mvapich1 version? > >> > > > >is it also okay if I tried with plain mvapich1 from OSU ? > I guess yes, although we use the one that comes with OFED. hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not reproducible. Using mvapich2-0.9.5 it happens each time... cheers. - Christian ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3
> >>We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. > >>We tried to run here the test you attached on mvapich1 but have not seen > >>this failure. > >>Can you try to reproduce with mvapich1 version? > >> > > > >is it also okay if I tried with plain mvapich1 from OSU ? > I guess yes, although we use the one that comes with OFED. hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not reproducible. Using mvapich2-0.9.5 it happens each time... cheers. - Christian smime.p7s Description: S/MIME cryptographic signature ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3
Christian Guggenberger wrote: >> Hi, >> We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. >> We tried to run here the test you attached on mvapich1 but have not seen >> this failure. >> Can you try to reproduce with mvapich1 version? >> > > is it also okay if I tried with plain mvapich1 from OSU ? I guess yes, although we use the one that comes with OFED. >> > this is with 2.6.5-7.276-smp > > > I'll see if we can update our kernel version. Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3
Hi, > >Attached is a simple MPI code that causes the hard lock. Also attached > >are some Kernel BUGs gathered via serial console - they look garbled, > >unfortunately. > >Note, everything is fine, if I use recent vanilla kernels on that SLES9 > >machine. > > > >cheers. > > - Christian > > > Hi, > We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. > We tried to run here the test you attached on mvapich1 but have not seen > this failure. > Can you try to reproduce with mvapich1 version? is it also okay if I tried with plain mvapich1 from OSU ? > If not please send us detailed instructions how to reproduce with > mvapich2 (where to take sources, compile, etc.) > BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853 > > We couldn't find it. > Which kernel version are you using? We use here 2.6.5-7.244-smp. > this is with 2.6.5-7.276-smp cheers. - Christian -- --- Phone +49-89-3299-1306 PGP http://www.rzg.mpg.de/~ccg/cg-public_key.asc S/MIME http://ra.rzg.mpg.de --- smime.p7s Description: S/MIME cryptographic signature ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3
Christian Guggenberger wrote: > Hi, > On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote: > >> Hi All, >> In testing today we found that on SLES9 SP3 memory locking as a regular >> user fails. >> > has any progress been made regarding this ? > > I'd like to ask if the SLES9 port is really mature yet, because I tried > to go a step ahead and tried some trivial MPI code as root, but failed > and got the involved node locked down hard. > Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox > PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest, > OFED-1.1-rc3 and mvapich2-0.9.5. > Attached is a simple MPI code that causes the hard lock. Also attached > are some Kernel BUGs gathered via serial console - they look garbled, > unfortunately. > Note, everything is fine, if I use recent vanilla kernels on that SLES9 > machine. > > cheers. > - Christian > Hi, We test here SLES9 but with mvapich1 library 0.9.7 version from OFED. We tried to run here the test you attached on mvapich1 but have not seen this failure. Can you try to reproduce with mvapich1 version? If not please send us detailed instructions how to reproduce with mvapich2 (where to take sources, compile, etc.) BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853 We couldn't find it. Which kernel version are you using? We use here 2.6.5-7.244-smp. Tziporet & Eli ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3
Hi, On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote: > Hi All, > In testing today we found that on SLES9 SP3 memory locking as a regular > user fails. has any progress been made regarding this ? I'd like to ask if the SLES9 port is really mature yet, because I tried to go a step ahead and tried some trivial MPI code as root, but failed and got the involved node locked down hard. Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest, OFED-1.1-rc3 and mvapich2-0.9.5. Attached is a simple MPI code that causes the hard lock. Also attached are some Kernel BUGs gathered via serial console - they look garbled, unfortunately. Note, everything is fine, if I use recent vanilla kernels on that SLES9 machine. cheers. - Christian -- --- Phone +49-89-3299-1306 PGP http://www.rzg.mpg.de/~ccg/cg-public_key.asc S/MIME http://ra.rzg.mpg.de --- #include #include #include #include #include #define TRIALS 2000 #define MESSAGE_SIZE 1000 #define TAG 5 int main(int argc, char **argv) { int i, sendTask, recvTask, ThisTask, NTask; char *buf; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &ThisTask); MPI_Comm_size(MPI_COMM_WORLD, &NTask); buf = malloc(MESSAGE_SIZE); /* system("exec date");*/ for(i=0; iKernel BUG at page_alloc:853 invalid operand: [1] SMP CPU 0 Pid: 7092, comm: hanger Tainted: PF U (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531) RIP: 0010:[] {__free_pages+30} RSP: 0018:0100e3fdbbf0 EFLAGS: 00010256 RAX: RBX: 0100e72d1280 RCX: 0100d000 RDX: 010002a1c4d8 RSI: RDI: 010002a1c4d8 RBP: 0100e3fdbcc8 R08: 0100e3fda000 R09: 0002 R10: 0064 R11: 0001 R12: R13: 0100e72d1280 R14: 01007e644d90 R15: 000493e0 FS: 002a95bb5b00() GS:8057dc00() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0041b009 CR3: 00101000 CR4: 06e0 Process hanger (pid: 7092, threadinfo 0100e3fda000, task 01007e644d90) Stack: 8013bd3f 801395a0 803d3400 0246 000339b3 0202 010002c1c600 006a 010002c1d6e0 Call Trace:{__mmdrop+63} {thread_return+108} {process_timeout+0} {schedule_timeout+246} {process_timeout+0} {:ib_mthca:mthca_cmd_wait+448} {default_wake_function+0} {default_wake_function+0} {:ib_mthca:mthca_cmd_box+66} {:ib_mthca:mthca_HW2SW_MPT+57} {:ib_mthca:mthca_free_mr+67} {:ib_mthca:mthca_dereg_mr+15} {:ib_core:ib_dereg_mr+26} {:ib_uverbs:ib_uverbs_close+611} {__fput+98} {filp_close+126} {sys_close+229} {system_call+124} Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83 RIP {__free_pages+30} RSP <0100e3fdbbf0> --- [cut here ] - [please bite here ] - Kernel BUG at page_alloc:853 invalid operand: [2] SMP CPU 1 Pid: 1, comm: init Tainted: PF U (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531) RIP: 0010:[] {__free_pages+30} RSP: 0018:01007ff81c80 EFLAGS: 00010256 RAX: RBX: 01007e1e4980 RCX: 01008000 RDX: 0100815b6068 RSI: RDI: 0100815b6068 RBP: 01007ff81d58 R08: 01007ff8 R09: 0013 R10: 000493e0 R11: 2710 R12: 0001 R13: 01007e1e4980 R14: 0100e7f3f2c0 R15: 000493e0 FS: 002a95bb5b00() GS:8057dc80() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0041b009 CR3: 7ff82000 CR4: 06e0 Process init (pid: 1, threadinfo 01007ff8, task 0100e7f3f2c0) Stack: 8013bd3f 0040 801395a0 0100e7f3e9a0 00d07f8a1580 0246 0001 0100816f5580 0001007d 0100816f6660 Call Trace:{__mmdrop+63} {thread_return+108} {schedule_timeout+246} {process_timeout+0} {do_select+1105} {__pollwait+0} {sys_select+902} {system_call+124} Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83 RIP {__free_pages+30} RSP <01007ff81c80> b-<-0>-K--er--ne--l - p[ancuict : hAertte em] pt-e--d --to-- k-i- ll[p ileniatse! ite here B] ad-- p--a-ge-- s--ta roK aert nferl eBe_UhG otat_c poaldge_p_aaglle oc(:in85 p3 0 ceinssv al'hidan ogeper'ra, ndpa: ge00 0 [0301] 008SM1P5b 6 68)CP U f0 la:0 x0P50id00:0 58025 m9,ap cpionmmg:: 00kl00og00d 00Ta00in00te0d00: 0 PFma ppU ed :(0 2.co6.un5-t:7.0 2p76ri-svampte S:0LxES009_00SP003_00BR ANBCHac-2kt00r6ac07e:24 104 l3C1)al t_RTrIPac: e:00<10ff:[ffad{b9ead>]_p ag{f8__0f16reaae_7fpa>{gefrs+ee30_h}o
[openib-general] problems to regiser memory as a reglar user on SLES9 SP3
Hi All, In testing today we found that on SLES9 SP3 memory locking as a regular user fails. Although I changed /etc/security/limits.conf and added the following two lines: * soft memlock * hard memlock Note that same change does work in SLES10. Another change I tried (that worked in gen1) was to add the following line to the file/etc/sysctl.conf: vm.disable_cap_mlock=1. However nothing helped in SLES9 Does anyone have any idea how to solve this? Thanks, Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general