Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-06 Thread Tziporet Koren
Moshe Kazir wrote:
> Hi Tziporet,
>
> I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64.
>
> Install is stopped at the very beginning as 64-bit udev is missing.
>
> I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed
> as result of compilation error.
>
> Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit
> udev ?
>
>   
We have here only one MAC PPC64 machine that can run only Fedora C4 thus 
this is the only system we check.
Maybe Vlad can help but I think best if you approach Novel (Mois is 
their contact for OFED)

Tziporet


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-06 Thread Moshe Kazir
Hi Tziporet,

I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64.

Install is stopped at the very beginning as 64-bit udev is missing.

I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed
as result of compilation error.

Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit
udev ?

Moshe 


Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tziporet Koren
Sent: Tuesday, August 29, 2006 5:50 PM
To: OPENIB
Subject: [openib-general] problems to regiser memory as a reglar user on
SLES9 SP3


Hi All,
In testing today we found that on SLES9 SP3 memory locking as a regular 
user fails.
Although I changed /etc/security/limits.conf and added the following two

lines:
* soft memlock 
* hard memlock 

Note that same change does work in SLES10.

Another change I tried (that worked in gen1) was to add the following 
line to the file/etc/sysctl.conf:
vm.disable_cap_mlock=1.

However nothing helped in SLES9

Does anyone have any idea how to solve this?

Thanks,
Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar

2006-09-05 Thread Tziporet Koren
Dhabaleswar Panda wrote:
> Christian - Thanks for sending instructions for running mvapich2-0.9.5
> to Tziporet.
>
> Tziporet - Thanks for looking into this problem on SLES9 environment.
>
> Please note that a detailed user guide for running and tuning MVAPICH2
> 0.9.5 is available from the following URL:
>
> http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html
>
> DK
>   
Thanks to all,
We found the bug that was in memory registration flow of SLES9 only.
A fix will be available in OFED 1.1 RC4

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar

2006-09-04 Thread Dhabaleswar Panda
Christian - Thanks for sending instructions for running mvapich2-0.9.5
to Tziporet.

Tziporet - Thanks for looking into this problem on SLES9 environment.

Please note that a detailed user guide for running and tuning MVAPICH2
0.9.5 is available from the following URL:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html

DK


> Hi Tziporet,
> On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote:
> > Can you explain me how to run mvapich2-0.9.5?
> 
> at first, simple compiling using the OSU scripts (make.mvapich2.gen2) -
> should work out of the box. (except you will use PCI-X HCAs - you'll
> have to ommit "-DSRQ" in the build script then). Note, python-devel is
> needed for the build.
> 
> then, assuming your doing your tests as root on a single box.
> 
> - create /etc/mpd.conf
> 
> containing the line "secretword=blabla" - just some non-meaningful
> passphrase ;)
> (you'll probably also need the same file as ~/.mpd.conf and
> ~/.mpdpasswd , too)
> 
> - start mpd ring
> # mpdboot -n 1 -f hosts
> (hosts should contain the hostname)
> 
> - check if mpdring is up and running
> # mpdtrace
> 
> - start application on 2 CPUs
> # mpiexec -n 2 ./a.out
> 
> - once tests are over, stop the ring
> # mpdallexit
> 
> hope that helps,
> 
> cheers.
>  - Christian



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Christian Guggenberger
Hi Tziporet,
On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote:
> Can you explain me how to run mvapich2-0.9.5?

at first, simple compiling using the OSU scripts (make.mvapich2.gen2) -
should work out of the box. (except you will use PCI-X HCAs - you'll
have to ommit "-DSRQ" in the build script then). Note, python-devel is
needed for the build.

then, assuming your doing your tests as root on a single box.

- create /etc/mpd.conf

containing the line "secretword=blabla" - just some non-meaningful
passphrase ;)
(you'll probably also need the same file as ~/.mpd.conf and
~/.mpdpasswd , too)

- start mpd ring
# mpdboot -n 1 -f hosts
(hosts should contain the hostname)

- check if mpdring is up and running
# mpdtrace

- start application on 2 CPUs
# mpiexec -n 2 ./a.out

- once tests are over, stop the ring
# mpdallexit

hope that helps,

cheers.
 - Christian



smime.p7s
Description: S/MIME cryptographic signature
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Tziporet Koren
Can you explain me how to run mvapich2-0.9.5?

Thanks,
Tziporet

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christian
Guggenberger
Sent: Monday, September 04, 2006 6:25 PM
To: Tziporet Koren
Cc: Eli Cohen; Christian Guggenberger; OPENIB
Subject: Re: [openib-general] problems to regiser memory as a reglar
user on SLES9 SP3

> >>We test here SLES9 but with mvapich1 library 0.9.7 version from
OFED.
> >>We tried to run here the test you attached on mvapich1 but have not
seen 
> >>this failure.
> >>Can you try to reproduce with mvapich1 version?
> >>
> >
> >is it also okay if I tried with plain mvapich1 from OSU ?
> I guess yes, although we use the one that comes with OFED.

hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not
reproducible. Using mvapich2-0.9.5 it happens each time...

cheers.
 - Christian

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Christian Guggenberger
> >>We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
> >>We tried to run here the test you attached on mvapich1 but have not seen 
> >>this failure.
> >>Can you try to reproduce with mvapich1 version?
> >>
> >
> >is it also okay if I tried with plain mvapich1 from OSU ?
> I guess yes, although we use the one that comes with OFED.

hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not
reproducible. Using mvapich2-0.9.5 it happens each time...

cheers.
 - Christian


smime.p7s
Description: S/MIME cryptographic signature
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Tziporet Koren
Christian Guggenberger wrote:
>> Hi,
>> We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
>> We tried to run here the test you attached on mvapich1 but have not seen 
>> this failure.
>> Can you try to reproduce with mvapich1 version?
>> 
>
> is it also okay if I tried with plain mvapich1 from OSU ?
I guess yes, although we use the one that comes with OFED.
>> 
> this is with 2.6.5-7.276-smp
>
>
>   
I'll see if we can update our kernel version.

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Christian Guggenberger
Hi,

> >Attached is a simple MPI code that causes the hard lock. Also attached
> >are some Kernel BUGs gathered via serial console - they look garbled,
> >unfortunately.
> >Note, everything is fine, if I use recent vanilla kernels on that SLES9
> >machine.
> >
> >cheers.
> > - Christian
> >  
> Hi,
> We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
> We tried to run here the test you attached on mvapich1 but have not seen 
> this failure.
> Can you try to reproduce with mvapich1 version?

is it also okay if I tried with plain mvapich1 from OSU ?

> If not please send us detailed instructions how to reproduce with 
> mvapich2 (where to take sources, compile, etc.)
> BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853
> 
> We couldn't find it.
> Which kernel version are you using? We use here 2.6.5-7.244-smp.
> 
this is with 2.6.5-7.276-smp

cheers.
 - Christian

-- 
---
Phone   +49-89-3299-1306
PGP http://www.rzg.mpg.de/~ccg/cg-public_key.asc
S/MIME  http://ra.rzg.mpg.de
---


smime.p7s
Description: S/MIME cryptographic signature
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Tziporet Koren
Christian Guggenberger wrote:
> Hi,
> On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote:
>   
>> Hi All,
>> In testing today we found that on SLES9 SP3 memory locking as a regular 
>> user fails.
>> 
> has any progress been made regarding this ?
>
> I'd like to ask if the SLES9 port is really mature yet, because I tried
> to go a step ahead and tried some trivial MPI code as root, but failed
> and got the involved node locked down hard.
> Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox
> PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest,
> OFED-1.1-rc3 and mvapich2-0.9.5.
> Attached is a simple MPI code that causes the hard lock. Also attached
> are some Kernel BUGs gathered via serial console - they look garbled,
> unfortunately.
> Note, everything is fine, if I use recent vanilla kernels on that SLES9
> machine.
>
> cheers.
>  - Christian
>   
Hi,
We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
We tried to run here the test you attached on mvapich1 but have not seen 
this failure.
Can you try to reproduce with mvapich1 version?
If not please send us detailed instructions how to reproduce with 
mvapich2 (where to take sources, compile, etc.)
BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853

We couldn't find it.
Which kernel version are you using? We use here 2.6.5-7.244-smp.

Tziporet & Eli





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-03 Thread Christian Guggenberger
Hi,
On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote:
> Hi All,
> In testing today we found that on SLES9 SP3 memory locking as a regular 
> user fails.
has any progress been made regarding this ?

I'd like to ask if the SLES9 port is really mature yet, because I tried
to go a step ahead and tried some trivial MPI code as root, but failed
and got the involved node locked down hard.
Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox
PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest,
OFED-1.1-rc3 and mvapich2-0.9.5.
Attached is a simple MPI code that causes the hard lock. Also attached
are some Kernel BUGs gathered via serial console - they look garbled,
unfortunately.
Note, everything is fine, if I use recent vanilla kernels on that SLES9
machine.

cheers.
 - Christian

-- 
---
Phone   +49-89-3299-1306
PGP http://www.rzg.mpg.de/~ccg/cg-public_key.asc
S/MIME  http://ra.rzg.mpg.de
---
#include 
#include 
#include 
#include 
#include 

#define TRIALS 2000
#define MESSAGE_SIZE 1000
#define TAG 5


int main(int argc, char **argv)
{
  int i, sendTask, recvTask, ThisTask, NTask;
  char *buf;
  MPI_Status status;


  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &ThisTask);
  MPI_Comm_size(MPI_COMM_WORLD, &NTask);


  buf = malloc(MESSAGE_SIZE);


/*  system("exec date");*/

  for(i=0; iKernel BUG at page_alloc:853
invalid operand:  [1] SMP
CPU 0
Pid: 7092, comm: hanger Tainted: PF  U   (2.6.5-7.276-smp 
SLES9_SP3_BRANCH-20060724104531)
RIP: 0010:[] {__free_pages+30}
RSP: 0018:0100e3fdbbf0  EFLAGS: 00010256
RAX:  RBX: 0100e72d1280 RCX: 0100d000
RDX: 010002a1c4d8 RSI:  RDI: 010002a1c4d8
RBP: 0100e3fdbcc8 R08: 0100e3fda000 R09: 0002
R10: 0064 R11: 0001 R12: 
R13: 0100e72d1280 R14: 01007e644d90 R15: 000493e0
FS:  002a95bb5b00() GS:8057dc00() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0041b009 CR3: 00101000 CR4: 06e0
Process hanger (pid: 7092, threadinfo 0100e3fda000, task 01007e644d90)
Stack: 8013bd3f  801395a0 803d3400
   0246 000339b3 0202 010002c1c600
   006a 010002c1d6e0
Call Trace:{__mmdrop+63} {thread_return+108}
   {process_timeout+0} 
{schedule_timeout+246}
   {process_timeout+0} 
{:ib_mthca:mthca_cmd_wait+448}
   {default_wake_function+0} 
{default_wake_function+0}
   {:ib_mthca:mthca_cmd_box+66} 
{:ib_mthca:mthca_HW2SW_MPT+57}
   {:ib_mthca:mthca_free_mr+67} 
{:ib_mthca:mthca_dereg_mr+15}
   {:ib_core:ib_dereg_mr+26} 
{:ib_uverbs:ib_uverbs_close+611}
   {__fput+98} {filp_close+126}
   {sys_close+229} {system_call+124}


Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83
RIP {__free_pages+30} RSP <0100e3fdbbf0>
 --- [cut here ] - [please bite here ] -
Kernel BUG at page_alloc:853
invalid operand:  [2] SMP
CPU 1
Pid: 1, comm: init Tainted: PF  U   (2.6.5-7.276-smp 
SLES9_SP3_BRANCH-20060724104531)
RIP: 0010:[] {__free_pages+30}
RSP: 0018:01007ff81c80  EFLAGS: 00010256
RAX:  RBX: 01007e1e4980 RCX: 01008000
RDX: 0100815b6068 RSI:  RDI: 0100815b6068
RBP: 01007ff81d58 R08: 01007ff8 R09: 0013
R10: 000493e0 R11: 2710 R12: 0001
R13: 01007e1e4980 R14: 0100e7f3f2c0 R15: 000493e0
FS:  002a95bb5b00() GS:8057dc80() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0041b009 CR3: 7ff82000 CR4: 06e0
Process init (pid: 1, threadinfo 01007ff8, task 0100e7f3f2c0)
Stack: 8013bd3f 0040 801395a0 0100e7f3e9a0
   00d07f8a1580 0246 0001 0100816f5580
   0001007d 0100816f6660
Call Trace:{__mmdrop+63} {thread_return+108}
   {schedule_timeout+246} 
{process_timeout+0}
   {do_select+1105} {__pollwait+0}
   {sys_select+902} {system_call+124}


Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83
RIP {__free_pages+30} RSP <01007ff81c80>
 b-<-0>-K--er--ne--l - p[ancuict : hAertte em] pt-e--d --to-- k-i- ll[p 
ileniatse!
  ite here B] ad-- p--a-ge-- s--ta
roK aert nferl eBe_UhG otat_c poaldge_p_aaglle oc(:in85 p3
0 ceinssv al'hidan ogeper'ra, ndpa: ge00 0 [0301] 008SM1P5b 6
 68)CP
U f0 la:0
x0P50id00:0 58025 m9,ap cpionmmg:: 00kl00og00d 00Ta00in00te0d00: 0 PFma  ppU ed 
 :(0 2.co6.un5-t:7.0 2p76ri-svampte S:0LxES009_00SP003_00BR
ANBCHac-2kt00r6ac07e:24
104
l3C1)al
t_RTrIPac: e:00<10ff:[ffad{b9ead>]_p ag{f8__0f16reaae_7fpa>{gefrs+ee30_h}o
  

[openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-08-29 Thread Tziporet Koren
Hi All,
In testing today we found that on SLES9 SP3 memory locking as a regular 
user fails.
Although I changed /etc/security/limits.conf and added the following two 
lines:
* soft memlock 
* hard memlock 

Note that same change does work in SLES10.

Another change I tried (that worked in gen1) was to add the following 
line to the file/etc/sysctl.conf:
vm.disable_cap_mlock=1.

However nothing helped in SLES9

Does anyone have any idea how to solve this?

Thanks,
Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general