Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-06 Thread Moshe Kazir
Hi Tziporet,

I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64.

Install is stopped at the very beginning as 64-bit udev is missing.

I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed
as result of compilation error.

Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit
udev ?

Moshe 


Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tziporet Koren
Sent: Tuesday, August 29, 2006 5:50 PM
To: OPENIB
Subject: [openib-general] problems to regiser memory as a reglar user on
SLES9 SP3


Hi All,
In testing today we found that on SLES9 SP3 memory locking as a regular 
user fails.
Although I changed /etc/security/limits.conf and added the following two

lines:
* soft memlock number
* hard memlock number

Note that same change does work in SLES10.

Another change I tried (that worked in gen1) was to add the following 
line to the file/etc/sysctl.conf:
vm.disable_cap_mlock=1.

However nothing helped in SLES9

Does anyone have any idea how to solve this?

Thanks,
Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-06 Thread Tziporet Koren
Moshe Kazir wrote:
 Hi Tziporet,

 I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64.

 Install is stopped at the very beginning as 64-bit udev is missing.

 I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed
 as result of compilation error.

 Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit
 udev ?

   
We have here only one MAC PPC64 machine that can run only Fedora C4 thus 
this is the only system we check.
Maybe Vlad can help but I think best if you approach Novel (Mois is 
their contact for OFED)

Tziporet


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar

2006-09-05 Thread Tziporet Koren
Dhabaleswar Panda wrote:
 Christian - Thanks for sending instructions for running mvapich2-0.9.5
 to Tziporet.

 Tziporet - Thanks for looking into this problem on SLES9 environment.

 Please note that a detailed user guide for running and tuning MVAPICH2
 0.9.5 is available from the following URL:

 http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html

 DK
   
Thanks to all,
We found the bug that was in memory registration flow of SLES9 only.
A fix will be available in OFED 1.1 RC4

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Tziporet Koren
Christian Guggenberger wrote:
 Hi,
 On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote:
   
 Hi All,
 In testing today we found that on SLES9 SP3 memory locking as a regular 
 user fails.
 
 has any progress been made regarding this ?

 I'd like to ask if the SLES9 port is really mature yet, because I tried
 to go a step ahead and tried some trivial MPI code as root, but failed
 and got the involved node locked down hard.
 Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox
 PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest,
 OFED-1.1-rc3 and mvapich2-0.9.5.
 Attached is a simple MPI code that causes the hard lock. Also attached
 are some Kernel BUGs gathered via serial console - they look garbled,
 unfortunately.
 Note, everything is fine, if I use recent vanilla kernels on that SLES9
 machine.

 cheers.
  - Christian
   
Hi,
We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
We tried to run here the test you attached on mvapich1 but have not seen 
this failure.
Can you try to reproduce with mvapich1 version?
If not please send us detailed instructions how to reproduce with 
mvapich2 (where to take sources, compile, etc.)
BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853

We couldn't find it.
Which kernel version are you using? We use here 2.6.5-7.244-smp.

Tziporet  Eli





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Christian Guggenberger
Hi,

 Attached is a simple MPI code that causes the hard lock. Also attached
 are some Kernel BUGs gathered via serial console - they look garbled,
 unfortunately.
 Note, everything is fine, if I use recent vanilla kernels on that SLES9
 machine.
 
 cheers.
  - Christian
   
 Hi,
 We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
 We tried to run here the test you attached on mvapich1 but have not seen 
 this failure.
 Can you try to reproduce with mvapich1 version?

is it also okay if I tried with plain mvapich1 from OSU ?

 If not please send us detailed instructions how to reproduce with 
 mvapich2 (where to take sources, compile, etc.)
 BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853
 
 We couldn't find it.
 Which kernel version are you using? We use here 2.6.5-7.244-smp.
 
this is with 2.6.5-7.276-smp

cheers.
 - Christian

-- 
---
Phone   +49-89-3299-1306
PGP http://www.rzg.mpg.de/~ccg/cg-public_key.asc
S/MIME  http://ra.rzg.mpg.de
---


smime.p7s
Description: S/MIME cryptographic signature
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Tziporet Koren
Christian Guggenberger wrote:
 Hi,
 We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
 We tried to run here the test you attached on mvapich1 but have not seen 
 this failure.
 Can you try to reproduce with mvapich1 version?
 

 is it also okay if I tried with plain mvapich1 from OSU ?
I guess yes, although we use the one that comes with OFED.
 
 this is with 2.6.5-7.276-smp


   
I'll see if we can update our kernel version.

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Christian Guggenberger
 We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
 We tried to run here the test you attached on mvapich1 but have not seen 
 this failure.
 Can you try to reproduce with mvapich1 version?
 
 
 is it also okay if I tried with plain mvapich1 from OSU ?
 I guess yes, although we use the one that comes with OFED.

hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not
reproducible. Using mvapich2-0.9.5 it happens each time...

cheers.
 - Christian


smime.p7s
Description: S/MIME cryptographic signature
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Tziporet Koren
Can you explain me how to run mvapich2-0.9.5?

Thanks,
Tziporet

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christian
Guggenberger
Sent: Monday, September 04, 2006 6:25 PM
To: Tziporet Koren
Cc: Eli Cohen; Christian Guggenberger; OPENIB
Subject: Re: [openib-general] problems to regiser memory as a reglar
user on SLES9 SP3

 We test here SLES9 but with mvapich1 library 0.9.7 version from
OFED.
 We tried to run here the test you attached on mvapich1 but have not
seen 
 this failure.
 Can you try to reproduce with mvapich1 version?
 
 
 is it also okay if I tried with plain mvapich1 from OSU ?
 I guess yes, although we use the one that comes with OFED.

hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not
reproducible. Using mvapich2-0.9.5 it happens each time...

cheers.
 - Christian

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-04 Thread Christian Guggenberger
Hi Tziporet,
On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote:
 Can you explain me how to run mvapich2-0.9.5?

at first, simple compiling using the OSU scripts (make.mvapich2.gen2) -
should work out of the box. (except you will use PCI-X HCAs - you'll
have to ommit -DSRQ in the build script then). Note, python-devel is
needed for the build.

then, assuming your doing your tests as root on a single box.

- create /etc/mpd.conf

containing the line secretword=blabla - just some non-meaningful
passphrase ;)
(you'll probably also need the same file as ~/.mpd.conf and
~/.mpdpasswd , too)

- start mpd ring
# mpdboot -n 1 -f hosts
(hosts should contain the hostname)

- check if mpdring is up and running
# mpdtrace

- start application on 2 CPUs
# mpiexec -n 2 ./a.out

- once tests are over, stop the ring
# mpdallexit

hope that helps,

cheers.
 - Christian



smime.p7s
Description: S/MIME cryptographic signature
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] problems to regiser memory as a reglar

2006-09-04 Thread Dhabaleswar Panda
Christian - Thanks for sending instructions for running mvapich2-0.9.5
to Tziporet.

Tziporet - Thanks for looking into this problem on SLES9 environment.

Please note that a detailed user guide for running and tuning MVAPICH2
0.9.5 is available from the following URL:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html

DK


 Hi Tziporet,
 On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote:
  Can you explain me how to run mvapich2-0.9.5?
 
 at first, simple compiling using the OSU scripts (make.mvapich2.gen2) -
 should work out of the box. (except you will use PCI-X HCAs - you'll
 have to ommit -DSRQ in the build script then). Note, python-devel is
 needed for the build.
 
 then, assuming your doing your tests as root on a single box.
 
 - create /etc/mpd.conf
 
 containing the line secretword=blabla - just some non-meaningful
 passphrase ;)
 (you'll probably also need the same file as ~/.mpd.conf and
 ~/.mpdpasswd , too)
 
 - start mpd ring
 # mpdboot -n 1 -f hosts
 (hosts should contain the hostname)
 
 - check if mpdring is up and running
 # mpdtrace
 
 - start application on 2 CPUs
 # mpiexec -n 2 ./a.out
 
 - once tests are over, stop the ring
 # mpdallexit
 
 hope that helps,
 
 cheers.
  - Christian



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-09-03 Thread Christian Guggenberger
Hi,
On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote:
 Hi All,
 In testing today we found that on SLES9 SP3 memory locking as a regular 
 user fails.
has any progress been made regarding this ?

I'd like to ask if the SLES9 port is really mature yet, because I tried
to go a step ahead and tried some trivial MPI code as root, but failed
and got the involved node locked down hard.
Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox
PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest,
OFED-1.1-rc3 and mvapich2-0.9.5.
Attached is a simple MPI code that causes the hard lock. Also attached
are some Kernel BUGs gathered via serial console - they look garbled,
unfortunately.
Note, everything is fine, if I use recent vanilla kernels on that SLES9
machine.

cheers.
 - Christian

-- 
---
Phone   +49-89-3299-1306
PGP http://www.rzg.mpg.de/~ccg/cg-public_key.asc
S/MIME  http://ra.rzg.mpg.de
---
#include stdio.h
#include stdlib.h
#include string.h
#include math.h
#include mpi.h

#define TRIALS 2000
#define MESSAGE_SIZE 1000
#define TAG 5


int main(int argc, char **argv)
{
  int i, sendTask, recvTask, ThisTask, NTask;
  char *buf;
  MPI_Status status;


  MPI_Init(argc, argv);
  MPI_Comm_rank(MPI_COMM_WORLD, ThisTask);
  MPI_Comm_size(MPI_COMM_WORLD, NTask);


  buf = malloc(MESSAGE_SIZE);


/*  system(exec date);*/

  for(i=0; iTRIALS; i++)
{
  sendTask = 0;
  recvTask = (i % (NTask-1)) + 1;

  
  MPI_Barrier(MPI_COMM_WORLD);
  if(ThisTask==0)
	printf(Try: i=%d (send/recvTask = %d/%d)\n, i, sendTask, recvTask); fflush(stdout);
  MPI_Barrier(MPI_COMM_WORLD);


  if(ThisTask == sendTask)
	MPI_Send(buf, MESSAGE_SIZE, MPI_BYTE, recvTask, TAG, MPI_COMM_WORLD);

  if(ThisTask == recvTask)
	MPI_Recv(buf, MESSAGE_SIZE, MPI_BYTE, sendTask, TAG, MPI_COMM_WORLD, status);
  
  
  MPI_Barrier(MPI_COMM_WORLD);
  if(ThisTask==0)
	printf(Success: i=%d (send/recvTask = %d/%d)\n\n, i, sendTask, recvTask); fflush(stdout);
  MPI_Barrier(MPI_COMM_WORLD);
}

/*  system(exec echo ++);
  system(exec hostname);
  system(exec date);	*/
  MPI_Finalize();	

  return 0;
}
Kernel BUG at page_alloc:853
invalid operand:  [1] SMP
CPU 0
Pid: 7092, comm: hanger Tainted: PF  U   (2.6.5-7.276-smp 
SLES9_SP3_BRANCH-20060724104531)
RIP: 0010:[8016ad9e] 8016ad9e{__free_pages+30}
RSP: 0018:0100e3fdbbf0  EFLAGS: 00010256
RAX:  RBX: 0100e72d1280 RCX: 0100d000
RDX: 010002a1c4d8 RSI:  RDI: 010002a1c4d8
RBP: 0100e3fdbcc8 R08: 0100e3fda000 R09: 0002
R10: 0064 R11: 0001 R12: 
R13: 0100e72d1280 R14: 01007e644d90 R15: 000493e0
FS:  002a95bb5b00() GS:8057dc00() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0041b009 CR3: 00101000 CR4: 06e0
Process hanger (pid: 7092, threadinfo 0100e3fda000, task 01007e644d90)
Stack: 8013bd3f  801395a0 803d3400
   0246 000339b3 0202 010002c1c600
   006a 010002c1d6e0
Call Trace:8013bd3f{__mmdrop+63} 801395a0{thread_return+108}
   801467b0{process_timeout+0} 
80147376{schedule_timeout+246}
   801467b0{process_timeout+0} 
a017f460{:ib_mthca:mthca_cmd_wait+448}
   80135cd0{default_wake_function+0} 
80135cd0{default_wake_function+0}
   a017f622{:ib_mthca:mthca_cmd_box+66} 
a017fd59{:ib_mthca:mthca_HW2SW_MPT+57}
   a0189423{:ib_mthca:mthca_free_mr+67} 
a019014f{:ib_mthca:mthca_dereg_mr+15}
   a0149e3a{:ib_core:ib_dereg_mr+26} 
a01e5543{:ib_uverbs:ib_uverbs_close+611}
   8018e332{__fput+98} 80189ffe{filp_close+126}
   8018a105{sys_close+229} 801106b4{system_call+124}


Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83
RIP 8016ad9e{__free_pages+30} RSP 0100e3fdbbf0
 --- [cut here ] - [please bite here ] -
Kernel BUG at page_alloc:853
invalid operand:  [2] SMP
CPU 1
Pid: 1, comm: init Tainted: PF  U   (2.6.5-7.276-smp 
SLES9_SP3_BRANCH-20060724104531)
RIP: 0010:[8016ad9e] 8016ad9e{__free_pages+30}
RSP: 0018:01007ff81c80  EFLAGS: 00010256
RAX:  RBX: 01007e1e4980 RCX: 01008000
RDX: 0100815b6068 RSI:  RDI: 0100815b6068
RBP: 01007ff81d58 R08: 01007ff8 R09: 0013
R10: 000493e0 R11: 2710 R12: 0001
R13: 01007e1e4980 R14: 0100e7f3f2c0 R15: 000493e0
FS:  002a95bb5b00() GS:8057dc80() knlGS:
CS:  

[openib-general] problems to regiser memory as a reglar user on SLES9 SP3

2006-08-29 Thread Tziporet Koren
Hi All,
In testing today we found that on SLES9 SP3 memory locking as a regular 
user fails.
Although I changed /etc/security/limits.conf and added the following two 
lines:
* soft memlock number
* hard memlock number

Note that same change does work in SLES10.

Another change I tried (that worked in gen1) was to add the following 
line to the file/etc/sysctl.conf:
vm.disable_cap_mlock=1.

However nothing helped in SLES9

Does anyone have any idea how to solve this?

Thanks,
Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general