[OMPI users] Flow control in OMPI

2011-08-25 Thread Rodrigo Oliveira
Hi there,

I am facing some problems in an Open MPI application. Part of the
application is composed by a sender and a receiver. The problem is that the
sender is so much faster than the receiver, what causes the receiver's
memory to be completely used, aborting the application.

I would like to know if there is a flow control scheme implemented in open
mpi or if this issue have to be treated at the user application's layer. If
exists, how it works and how can I use it in my application?

I did some research about this subject, but I did not find a conclusive
explanation.

Thanks a lot.


[OMPI users] ConnectX with InfiniHost IB HCAs

2011-08-25 Thread worldeb

 Hi all,

it is more hardware or system configuration question but 
I hope people in this list have an experience.
I have just added new ConnectX IB card to cluster with InfiniHost cards.
And no mpi programs work. Even ofed's tests do not work.
For example ib_send_*, ib_write_* just segfault on the host with ConnectX card 
and 
still wait on the hosts with InfiniHost card. rdma_lat/bw tests segfault too but
with messages on the InfiniHost card hosts like this:
 server read: No such file or directory
 5924:pp_server_exch_dest: 0/45 Couldn't read remote address

 pp_read_keys: No such file or directory
 Couldn't read remote address

Other diagnostic tools like ibv_device, ibchecknet, ibstat, ibstatus... show no 
errors
and show ConnectX card in system. All modules (mlx4_*, rdma_*) loaded. IPoIB 
configured.
openibd, opensmd services started without errors.

08:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s 
- IB QDR / 10GigE] (rev a0)
OFED is 1.3.1, CentOS 5.2.

ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 1
Firmware version: 2.7.0
Hardware version: a0
Node GUID: 0x0002c903000cad14
System image GUID: 0x0002c903000cad17
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 60
LMC: 0
SM lid: 60
Capability mask: 0x0251086a
Port GUID: 0x0002c903000cad15

Where is a problem?

Thanx in advance,
Egor.


Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Rayson Ho
Don't know which SSI project you are referring to... I only know the
OpenSSI project, and I was one of the first who subscribed to its
mailing list (since 2001).

http://openssi.org/cgi-bin/view?page=openssi.html

I don't think those OpenSSI clusters are designed for tens of
thousands of nodes, and not sure if it scales well to even a thousand
nodes -- so IMO they have limited use for HPC clusters.

Rayson



On Thu, Aug 25, 2011 at 11:45 AM, Durga Choudhury  wrote:
> Also, in 2005 there was an attempt to implement SSI (Single System
> Image) functionality to the then-current 2.6.10 kernel. The proposal
> was very detailed and covered most of the bases of task creation, PID
> allocation etc across a loosely tied cluster (without using fancy
> hardware such as RDMA fabric). Anybody knows if it was ever
> implemented? Any pointers in this direction?
>
> Thanks and regards
> Durga
>
>
> On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho  wrote:
>> Srinivas,
>>
>> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
>> if you can checkpoint an MPI task and restart it on a new node, then
>> this is also "process migration".
>>
>> Of course, doing a checkpoint & restart can be slower than pure
>> in-kernel process migration, but the advantage is that you don't need
>> any kernel support, and can in fact do all of it in user-space.
>>
>> Rayson
>>
>>
>> On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
>>> It also depends on what part of migration interests you - are you wanting 
>>> to look at the MPI part of the problem (reconnecting MPI transports, 
>>> ensuring messages are not lost, etc.) or the RTE part of the problem (where 
>>> to restart processes, detecting failures, etc.)?
>>>
>>>
>>> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:
>>>
 Be aware that process migration is a pretty complex issue.

 Josh is probably the best one to answer your question directly, but he's 
 out today.


 On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:

> I am final year grad student looking for my final year project in 
> OpenMPI.We are group of 4 students.
> I wanted to know about the "Process Migration" process of MPI processes 
> in OpenMPI.
> Can anyone suggest me any ideas for project related to process migration 
> in OenMPI or other topics in Systems.
>
>
>
> regards,
> Srinivas Kundaram
> srinu1...@gmail.com
> +91-8149399160
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


 --
 Jeff Squyres
 jsquy...@cisco.com
 For corporate legal information go to:
 http://www.cisco.com/web/about/doing_business/legal/cri/


 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Rayson
>>
>> ==
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/



Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Durga Choudhury
Is anything done at the kernel level portable (e.g. to Windows)? It
*can* be, in principle at least (by putting appropriate #ifdef's in
the code), but I am wondering if it is in reality.

Also, in 2005 there was an attempt to implement SSI (Single System
Image) functionality to the then-current 2.6.10 kernel. The proposal
was very detailed and covered most of the bases of task creation, PID
allocation etc across a loosely tied cluster (without using fancy
hardware such as RDMA fabric). Anybody knows if it was ever
implemented? Any pointers in this direction?

Thanks and regards
Durga


On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho  wrote:
> Srinivas,
>
> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
> if you can checkpoint an MPI task and restart it on a new node, then
> this is also "process migration".
>
> Of course, doing a checkpoint & restart can be slower than pure
> in-kernel process migration, but the advantage is that you don't need
> any kernel support, and can in fact do all of it in user-space.
>
> Rayson
>
>
> On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
>> It also depends on what part of migration interests you - are you wanting to 
>> look at the MPI part of the problem (reconnecting MPI transports, ensuring 
>> messages are not lost, etc.) or the RTE part of the problem (where to 
>> restart processes, detecting failures, etc.)?
>>
>>
>> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:
>>
>>> Be aware that process migration is a pretty complex issue.
>>>
>>> Josh is probably the best one to answer your question directly, but he's 
>>> out today.
>>>
>>>
>>> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
>>>
 I am final year grad student looking for my final year project in 
 OpenMPI.We are group of 4 students.
 I wanted to know about the "Process Migration" process of MPI processes in 
 OpenMPI.
 Can anyone suggest me any ideas for project related to process migration 
 in OenMPI or other topics in Systems.



 regards,
 Srinivas Kundaram
 srinu1...@gmail.com
 +91-8149399160
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Rayson
>
> ==
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Rayson Ho
Srinivas,

There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
if you can checkpoint an MPI task and restart it on a new node, then
this is also "process migration".

Of course, doing a checkpoint & restart can be slower than pure
in-kernel process migration, but the advantage is that you don't need
any kernel support, and can in fact do all of it in user-space.

Rayson


On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
> It also depends on what part of migration interests you - are you wanting to 
> look at the MPI part of the problem (reconnecting MPI transports, ensuring 
> messages are not lost, etc.) or the RTE part of the problem (where to restart 
> processes, detecting failures, etc.)?
>
>
> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:
>
>> Be aware that process migration is a pretty complex issue.
>>
>> Josh is probably the best one to answer your question directly, but he's out 
>> today.
>>
>>
>> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
>>
>>> I am final year grad student looking for my final year project in 
>>> OpenMPI.We are group of 4 students.
>>> I wanted to know about the "Process Migration" process of MPI processes in 
>>> OpenMPI.
>>> Can anyone suggest me any ideas for project related to process migration in 
>>> OenMPI or other topics in Systems.
>>>
>>>
>>>
>>> regards,
>>> Srinivas Kundaram
>>> srinu1...@gmail.com
>>> +91-8149399160
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/



Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Ralph Castain
It also depends on what part of migration interests you - are you wanting to 
look at the MPI part of the problem (reconnecting MPI transports, ensuring 
messages are not lost, etc.) or the RTE part of the problem (where to restart 
processes, detecting failures, etc.)?


On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:

> Be aware that process migration is a pretty complex issue.
> 
> Josh is probably the best one to answer your question directly, but he's out 
> today.
> 
> 
> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
> 
>> I am final year grad student looking for my final year project in OpenMPI.We 
>> are group of 4 students.
>> I wanted to know about the "Process Migration" process of MPI processes in 
>> OpenMPI.
>> Can anyone suggest me any ideas for project related to process migration in 
>> OenMPI or other topics in Systems.
>> 
>> 
>> 
>> regards,
>> Srinivas Kundaram
>> srinu1...@gmail.com
>> +91-8149399160
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] problems with parallel IO solved!

2011-08-25 Thread Alexander Beck-Ratzka
Hi Folks,

the problem could be solved be using the same compiler settings for writung 
out and reading in. Writing out was done with -trace (Intel compiler), and the 
read in withou any supplemental options.

Best wishes

Alexander

> Hi Folks,
> 
> I have problems to retrieve my data thatI have written out with MPI
> parallel IO. Ins tests everything works fine, but within an huger
> environment, the data read in differ from those written out.
> 
> Here the setup of my experiment:
> 
> # the writer #
> program parallel_io
> 
>   use mpi
> 
>   implicit none
> 
>   integer,parameter :: nx=1,ny=300,nz=256,nv=12
>   integer ierr, i, myrank, comm_size, BUFSIZE, thefile, intsize
> 
>   parameter (BUFSIZE=1075200)
> 
>   real,dimension(nv+2,nx,ny,nz) :: v1
> 
>   integer (kind=MPI_OFFSET_KIND) disp
>   integer ix, iy, iz, nn, counter
> 
>   character(6) cname
>   call mpi_init(ierr)
>   call mpi_comm_size(mpi_comm_world, comm_size, ierr)
>   call mpi_comm_rank(mpi_comm_world, myrank,ierr)
> 
>   counter=0
>   do ix = 1,nz
>  do iy=1,ny
> do iz=1,nx
>do nn=1,nv+2
>   v1(nn,ix,iy,iz) = counter*(myrank+20)/200.
>   counter=counter+1
>end do
> end do
>  end do
>   end do
> 
>   call mpi_barrier(mpi_comm_world,ierr)
> 
>   call mpi_type_extent(mpi_real, intsize, ierr)
>   call mpi_file_open(mpi_comm_world, 'testfile', MPI_MODE_WRONLY +
> MPI_MODE_CREATE, mpi_info_null, thefile, ierr)
>   call mpi_type_size(MPI_INTEGER, intsize, ierr)
> 
>   disp = myrank * BUFSIZE * intsize
> 
>   !  call mpi_file_set_view(thefile, disp, MPI_INTEGER, MPI_INTEGER,
> 'native', mpi_info_null, ierr)
>   call mpi_file_write_at(thefile, disp, v1(1,1,1,1), BUFSIZE, MPI_REAL,
> mpi_status_ignore, ierr)
> 
>   call mpi_file_close(thefile, ierr)
> 
>   !  print the data read in...
> 
>   open (12, file='out000.dat-parallel-write-0')
> 
>   if (myrank.eq.0) then
>  write (12,'(i4,e18.8)') myrank,
> v1(nn,ix,iy,iz),nn=1,nv+2),ix=1,nx),iy=1,ny), iz=1,nz)
>   endif
> 
>   close (12)
> 
>   call mpi_finalize(ierr)
> 
> 
> end program parallel_io
> 
> ###
> 
> and the reader...
> 
> reader###
>  program parallel_read_io
> 
>   use mpi
> 
>   implicit none
>   integer,parameter :: nx=1,ny=300,nz=256,nv=12
> 
>   integer ierr, i, myrank, comm_size, BUFSIZE, thefile, realsize
>   parameter (BUFSIZE=1075200)
> 
>   real,dimension(nv+2,nx,ny,nz) :: v1
> 
>   integer (kind=MPI_OFFSET_KIND) disp
> 
>   integer ix, iy, iz, nn
> 
>   call mpi_init(ierr)
>   call mpi_comm_size(mpi_comm_world, comm_size, ierr)
>   call mpi_comm_rank(mpi_comm_world, myrank,ierr)
> 
>   !  do i=0,BUFSIZE
>   ! buf(i) = myrank*BUFSIZE + i
>   !  end do
> 
>   call mpi_type_extent(mpi_integer, realsize, ierr)
>   call mpi_file_open(mpi_comm_world, 'testfile', MPI_MODE_RDONLY,
> mpi_info_null, thefile, ierr)
>   call mpi_type_size(MPI_REAL, realsize, ierr)
> 
>   disp = myrank * BUFSIZE * realsize
>   print*, 'myrank: ', myrank, '  disp: ', disp, '  realsize: ', realsize
> 
>   !  call mpi_file_set_view(thefile, disp, MPI_INTEGER, MPI_INTEGER,
> 'native', mpi_info_null, ierr)
>   !  call mpi_file_read(thefile, buf, BUFSIZE, MPI_INTEGER,
> mpi_status_ignore, ierr)
> 
>   call mpi_file_read_at(thefile, disp, v1(1,1,1,1), BUFSIZE, MPI_REAL,
> mpi_status_ignore, ierr)
> 
>   call mpi_file_close(thefile, ierr)
> 
>   call mpi_barrier(mpi_comm_world,ierr)
> 
>   !  print the data read in...
> 
>   open (12, file='out000.dat-parallel-read-0')
> 
>   if (myrank.eq.0) then
>  write (12,'(i4,e18.8)') myrank,
> v1(nn,ix,iy,iz),nn=1,nv+2),ix=1,nx),iy=1,ny), iz=1,nz)
>   endif
> 
>   close (12)
> 
>   call mpi_finalize(ierr)
> 
> 
> end program parallel_read_io
> ###
> 
> Here everything is working fine. However integrating this into a huger
> program, I get totally different data written out and read in.
> 
> The setup up is the same as in the experiment, but I need some more
> memory...
> 
> What might be the reason for such problems, and if I have an MPI error, how
> can I estimate this within a fortan program. I have only found examples for
> the error handling of MPI errors in C or C++. I would need an example for
> C.
> 
> So any hints or ideas?
> 
> Best wishes
> 
> Alexander
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] problems with parallel IO

2011-08-25 Thread Alexander Beck-Ratzka
Hi Folks,

I have problems to retrieve my data thatI have written out with MPI parallel 
IO. Ins tests everything works fine, but within an huger environment, the data 
read in differ from those written out. 

Here the setup of my experiment:

# the writer #
program parallel_io

  use mpi

  implicit none

  integer,parameter :: nx=1,ny=300,nz=256,nv=12
  integer ierr, i, myrank, comm_size, BUFSIZE, thefile, intsize

  parameter (BUFSIZE=1075200)

  real,dimension(nv+2,nx,ny,nz) :: v1

  integer (kind=MPI_OFFSET_KIND) disp
  integer ix, iy, iz, nn, counter

  character(6) cname
  call mpi_init(ierr)
  call mpi_comm_size(mpi_comm_world, comm_size, ierr)
  call mpi_comm_rank(mpi_comm_world, myrank,ierr)

  counter=0
  do ix = 1,nz
 do iy=1,ny
do iz=1,nx
   do nn=1,nv+2
  v1(nn,ix,iy,iz) = counter*(myrank+20)/200.
  counter=counter+1
   end do
end do
 end do
  end do

  call mpi_barrier(mpi_comm_world,ierr)

  call mpi_type_extent(mpi_real, intsize, ierr)
  call mpi_file_open(mpi_comm_world, 'testfile', MPI_MODE_WRONLY + 
MPI_MODE_CREATE, mpi_info_null, thefile, ierr)
  call mpi_type_size(MPI_INTEGER, intsize, ierr)

  disp = myrank * BUFSIZE * intsize

  !  call mpi_file_set_view(thefile, disp, MPI_INTEGER, MPI_INTEGER, 'native', 
mpi_info_null, ierr)
  call mpi_file_write_at(thefile, disp, v1(1,1,1,1), BUFSIZE, MPI_REAL, 
mpi_status_ignore, ierr)

  call mpi_file_close(thefile, ierr)

  !  print the data read in...

  open (12, file='out000.dat-parallel-write-0')

  if (myrank.eq.0) then
 write (12,'(i4,e18.8)') myrank, 
v1(nn,ix,iy,iz),nn=1,nv+2),ix=1,nx),iy=1,ny), iz=1,nz)
  endif

  close (12)

  call mpi_finalize(ierr)


end program parallel_io

###

and the reader...

reader###
 program parallel_read_io

  use mpi

  implicit none
  integer,parameter :: nx=1,ny=300,nz=256,nv=12

  integer ierr, i, myrank, comm_size, BUFSIZE, thefile, realsize
  parameter (BUFSIZE=1075200)

  real,dimension(nv+2,nx,ny,nz) :: v1

  integer (kind=MPI_OFFSET_KIND) disp

  integer ix, iy, iz, nn

  call mpi_init(ierr)
  call mpi_comm_size(mpi_comm_world, comm_size, ierr)
  call mpi_comm_rank(mpi_comm_world, myrank,ierr)

  !  do i=0,BUFSIZE
  ! buf(i) = myrank*BUFSIZE + i
  !  end do

  call mpi_type_extent(mpi_integer, realsize, ierr)
  call mpi_file_open(mpi_comm_world, 'testfile', MPI_MODE_RDONLY, 
mpi_info_null, 
thefile, ierr)
  call mpi_type_size(MPI_REAL, realsize, ierr)

  disp = myrank * BUFSIZE * realsize
  print*, 'myrank: ', myrank, '  disp: ', disp, '  realsize: ', realsize

  !  call mpi_file_set_view(thefile, disp, MPI_INTEGER, MPI_INTEGER, 'native', 
mpi_info_null, ierr)
  !  call mpi_file_read(thefile, buf, BUFSIZE, MPI_INTEGER, mpi_status_ignore, 
ierr)

  call mpi_file_read_at(thefile, disp, v1(1,1,1,1), BUFSIZE, MPI_REAL, 
mpi_status_ignore, ierr)

  call mpi_file_close(thefile, ierr)

  call mpi_barrier(mpi_comm_world,ierr)

  !  print the data read in...

  open (12, file='out000.dat-parallel-read-0')

  if (myrank.eq.0) then
 write (12,'(i4,e18.8)') myrank, 
v1(nn,ix,iy,iz),nn=1,nv+2),ix=1,nx),iy=1,ny), iz=1,nz)
  endif

  close (12)

  call mpi_finalize(ierr)


end program parallel_read_io
###

Here everything is working fine. However integrating this into a huger program,
I get totally different data written out and read in.

The setup up is the same as in the experiment, but I need some more memory...

What might be the reason for such problems, and if I have an MPI error, how 
can I estimate this within a fortan program. I have only found examples for 
the error handling of MPI errors in C or C++. I would need an example for C.

So any hints or ideas?

Best wishes

Alexander