Re: [OMPI users] problems on parallel writing

2010-02-25 Thread w k
Hi Jody,

I tried your suggestion but it still failed. Attached is the modified code.
If your machine has fortran compiler as well, you can try it.

BTW, how many processors did you use for testing your C code?


Thanks,
Kan




On Thu, Feb 25, 2010 at 3:35 AM, jody  wrote:

> Hi
> Just wanted to let you know:
>
> I translated your program to C ran it, and it crashed at MPI_FILE_SET_VIEW
> in a similar way than yours did.
> then i added an if-clause to prevent the call of MPI_FILE_WRITE with
> the undefined value.
>if (myid == 0) {
>MPI_File_write(fh, temp, count, MPI_DOUBLE, );
>}
> After this it ran without crash.
> However, the output is not what you expected:
> The number 2122010.0 was not there - probably overwritten by the
> MPI_FILE_WRITE_ALL.
> But this was fixed by replacing the line
>  disp=0
> by
>  disp=8
> and removing the
>   if (single_no .gt. 0) map = map + 1
> statement.
>
> So here's what all looks like:
>
> ===
> program test_MPI_write_adv2
>
>
>  !-- Template for any mpi program
>
>  implicit none
>
>  !--Include the mpi header file
>  include 'mpif.h'  ! --> Required statement
>
>  !--Declare all variables and arrays.
>  integer :: fh, ierr, myid, numprocs, itag, etype, filetype, info
>  integer :: status(MPI_STATUS_SIZE)
>  integer :: irc, ip
>  integer(kind=mpi_offset_kind) :: offset, disp
>  integer :: i, j, k
>
>  integer :: num
>
>  character(len=64) :: filename
>
>  real(8), pointer :: q(:), temp(:)
>  integer, pointer :: map(:)
>  integer :: single_no, count
>
>
>  !--Initialize MPI
>  call MPI_INIT( ierr ) ! --> Required statement
>
>  !--Who am I? --- get my rank=myid
>  call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
>
>  !--How many processes in the global group?
>  call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
>
>  if ( myid == 0 ) then
> single_no = 4
>  elseif ( myid == 1 ) then
> single_no = 2
>  elseif ( myid == 2 ) then
> single_no = 2
>  elseif ( myid == 3 ) then
> single_no = 3
>  else
> single_no = 0
>  end if
>
>  if (single_no .gt. 0) allocate(map(single_no))
>
>  if ( myid == 0 ) then
> map = (/ 0, 2, 5, 6 /)
>  elseif ( myid == 1 ) then
> map = (/ 1, 4 /)
>  elseif ( myid == 2 ) then
> map = (/ 3, 9 /)
>  elseif ( myid == 3 ) then
> map = (/ 7, 8, 10 /)
>  end if
>
>  if (single_no .gt. 0) allocate(q(single_no))
>
>  if (single_no .gt. 0) then
> do i = 1,single_no
>q(i) = dble(myid+1)*100.0d0 + dble(map(i)+1)
> end do
>  end if
>
>
>   if ( myid == 0 ) then
> count = 1
>  else
> count = 0
>  end if
>
>  if (count .gt. 0) then
> allocate(temp(count))
> temp(1) = 2122010.0d0
>  end if
>
>  write(filename,'(a)') 'test_write.bin'
>
>  call MPI_FILE_OPEN(MPI_COMM_WORLD, filename,
> MPI_MODE_RDWR+MPI_MODE_CREATE, MPI_INFO_NULL, fh, ierr)
>
>   if (my_id == 0) then
> call MPI_FILE_WRITE(FH, temp, COUNT, MPI_REAL8, STATUS, IERR)
>   endif
>
>  call MPI_TYPE_CREATE_INDEXED_BLOCK(single_no, 1, map,
> MPI_DOUBLE_PRECISION, filetype, ierr)
>  call MPI_TYPE_COMMIT(filetype, ierr)
>   disp = 8  ! ---> size of MPI_REAL8 (number written when my_id = 0)
>   call MPI_FILE_SET_VIEW(fh, disp, MPI_DOUBLE_PRECISION, filetype,
> 'native', MPI_INFO_NULL, ierr)
>  call MPI_FILE_WRITE_ALL(fh, q, single_no, MPI_DOUBLE_PRECISION, status,
> ierr)
>  call MPI_FILE_CLOSE(fh, ierr)
>
>
>  if (single_no .gt. 0) deallocate(map)
>
>  if (single_no .gt. 0) deallocate(q)
>
>  if (count .gt. 0) deallocate(temp)
>
>  !--Finilize MPI
>  call MPI_FINALIZE(irc)! ---> Required statement
>
>  stop
>
>
> end program test_MPI_write_adv2
>
> ===
>
> Regards
>   jody
>
> On Thu, Feb 25, 2010 at 2:47 AM, Terry Frankcombe 
> wrote:
> > On Wed, 2010-02-24 at 13:40 -0500, w k wrote:
> >> Hi Jordy,
> >>
> >> I don't think this part caused the problem. For fortran, it doesn't
> >> matter if the pointer is NULL as long as the count requested from the
> >> processor is 0. Actually I tested the code and it passed this part
> >> without problem. I believe it aborted at MPI_FILE_SET_VIEW part.
> >>
> >
> > For the record:  A pointer is not NULL unless you've nullified it.
> > IIRC, the Fortran standard says that any non-assigning reference to an
> > unassigned, unnullified pointer is undefined (or maybe illegal... check
> > the standard).
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
program test_MPI_write_adv2


  !-- Template for any mpi program

  implicit none

  !--Include the mpi 

Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Amr Hassan
Hi All,

Thanks a lot for your support. It was a big help. I found a race condition
in my code and now the problem is solved.

Regards,
Amr

On Fri, Feb 26, 2010 at 7:45 AM, Prentice Bisbal  wrote:

> Amr Hassan wrote:
> > Thanks alot for your reply,
> >
> > I'm using blocking Send and Receive. All the clients are sending data
> > and the server is receive the messages from the clients with
> > MPI_ANY_SOURCE as the sender. Do you think there is a race condition
> > near this pattern?
> >
> > I searched a lot and used totalview but I couldn't detect such case. I
> > really appreciate if you send me a link or give an example of a possible
> > race condition in that scenario .
> >
> > Also, when I partition the message into smaller parts (send in sequence
> > - all the other clients wait until the send finish) it works fine. is
> > that exclude the race condition?
> >
>
> It sounds like, when sending the large messages, you are putting more
> data into a buffer than it can hold. When you break the messages up into
>  smaller sizes, you're not overflowing the buffer.
>
> Are you using MPI_Pack, by any chance?
>
> --
> Prentice Bisbal
> Linux Software Support Specialist/System Administrator
> School of Natural Sciences
> Institute for Advanced Study
> Princeton, NJ
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Prentice Bisbal
Amr Hassan wrote:
> Thanks alot for your reply,
>  
> I'm using blocking Send and Receive. All the clients are sending data
> and the server is receive the messages from the clients with
> MPI_ANY_SOURCE as the sender. Do you think there is a race condition
> near this pattern? 
>  
> I searched a lot and used totalview but I couldn't detect such case. I
> really appreciate if you send me a link or give an example of a possible
> race condition in that scenario . 
>  
> Also, when I partition the message into smaller parts (send in sequence
> - all the other clients wait until the send finish) it works fine. is
> that exclude the race condition?
>  

It sounds like, when sending the large messages, you are putting more
data into a buffer than it can hold. When you break the messages up into
 smaller sizes, you're not overflowing the buffer.

Are you using MPI_Pack, by any chance?

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Prentice Bisbal
I was getting the same error a few weeks ago. In my case the error
message was spot on. I was trying to put too much data in a buffer using
MPI_Pack.

I was able to track down the problem using valgrind. Have you tried that
yet? You need to install valgrind first and then compile OpenMPI with
valgrind support. It takes some time, but is worth it.

http://www.open-mpi.org/faq/?category=debugging#memchecker_what

Amr Hassan wrote:
> Hi All,
> 
> I'm facing a strange problem with OpenMPI.
> 
> I'm developing an application which is required to send a message from
> each client  (1 MB each) to a server node for around 10 times per second
> (it's a distributed render application and I'm trying to reach a higher
> frame rate ). The problem is that OpenMPI crash in that case and only
> works if I partition this messages into a set of 20 k sub-messages with
> a sleep between each one of them for around 1 to 10 ms!! This solution
> is very expensive in term of time needed to send the data.  Is there any
> other solutions?
> 
> The error i got now is:
> Signal: Segmentation fault (11)
> Signal code:  Address not mapped (1)
> Failing at address: x
> 
> The OS is Linux CentOS.  I'm using the latest version of OpenMPI.
> 
> I appreciate any help regarding that.
> 
>  Regards,
> Amr
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] orte-checkpoint hangs

2010-02-25 Thread Josh Hursey

On Feb 10, 2010, at 9:45 AM, Addepalli, Srirangam V wrote:

> I am trying to test orte-checkpoint with a MPI JOB. It how ever hangs for all 
> jobs.  This is how i submit the job is started
> mpirun -np 8 -mca ft-enable cr /apps/nwchem-5.1.1/bin/LINUX64/nwchem 
> siosi6.nw 

This might be the problem, if it wasn't a typo. The command line flag is "-am 
ft-enable-cr" not "-mca ft-enable cr". The former activates a set of MCA 
parameters (in the AMCA file 'ft-enable-cr'). The latter should be ignored by 
the MCA system.

Give that a try and let us know if the behavior changes.

-- Josh

>> From another terminal i try the orte-checkpoint
> 
> ompi-checkpoint -v --term 9338
> [compute-19-12.local:09377] orte_checkpoint: Checkpointing...
> [compute-19-12.local:09377]  PID 9338
> [compute-19-12.local:09377]  Connected to Mpirun [[5009,0],0]
> [compute-19-12.local:09377]  Terminating after checkpoint
> [compute-19-12.local:09377] orte_checkpoint: notify_hnp: Contact Head Node 
> Process PID 9338
> [compute-19-12.local:09377] orte_checkpoint: notify_hnp: Requested a 
> checkpoint of jobid [INVALID]
> [compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Receive a command 
> message.
> [compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Status Update.
> 
> 
> Is there any way to debug the issue to get more information or log messages.
> 
> Rangam
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Amr Hassan
Yes but only one thread at each client is allowed to use MPI. Also, there is
a semaphore on the MPI usage.



On Fri, Feb 26, 2010 at 1:09 AM, Brian Budge  wrote:

> Is your code multithreaded?
>
> On Feb 25, 2010 12:56 AM, "Amr Hassan"  wrote:
>
> Thanks alot for your reply,
>
> I'm using blocking Send and Receive. All the clients are sending data and
> the server is receive the messages from the clients with MPI_ANY_SOURCE as
> the sender. Do you think there is a race condition near this pattern?
>
> I searched a lot and used totalview but I couldn't detect such case. I
> really appreciate if you send me a link or give an example of a possible
> race condition in that scenario .
>
> Also, when I partition the message into smaller parts (send in sequence -
> all the other clients wait until the send finish) it works fine. is that
> exclude the race condition?
>
>
> Regards,
> Amr
>
>
>
>
> >>We've seen similar things in our code. In our case it is probably due to
> a
> >>race condition
>
>
> >>On Feb 24, 2010 9:36 PM, "Amr Hassan" 
> wrote:
>
> >>Hi All,
>
> >>I'm ...
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Sending relatively large messages with highfrequency

2010-02-25 Thread Jeff Squyres
On Feb 25, 2010, at 3:56 AM, Amr Hassan wrote:

> Thanks alot for your reply,
>  
> I'm using blocking Send and Receive. All the clients are sending data and the 
> server is receive the messages from the clients with MPI_ANY_SOURCE as the 
> sender. Do you think there is a race condition near this pattern? 

MPI_ANY_SOURCE can definitely lead to race conditions / messages (and content) 
from unexpected sources.  Try using explicit sources, if possible.

> I searched a lot and used totalview but I couldn't detect such case. I really 
> appreciate if you send me a link or give an example of a possible race 
> condition in that scenario . 

You might want to let it run to segv in TV and see exactly where the segv 
occurs in your code.  Is your code processing what it thinks is message A but 
is really message B?  If the content (and therefore the processing of) B is 
different than A, then assumptions can go badly in your code and Bad Things may 
happen.

> Also, when I partition the message into smaller parts (send in sequence - all 
> the other clients wait until the send finish) it works fine. is that exclude 
> the race condition?

No.  It somewhat suggests that you do have a race condition.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Brian Budge
Is your code multithreaded?

On Feb 25, 2010 12:56 AM, "Amr Hassan"  wrote:

Thanks alot for your reply,

I'm using blocking Send and Receive. All the clients are sending data and
the server is receive the messages from the clients with MPI_ANY_SOURCE as
the sender. Do you think there is a race condition near this pattern?

I searched a lot and used totalview but I couldn't detect such case. I
really appreciate if you send me a link or give an example of a possible
race condition in that scenario .

Also, when I partition the message into smaller parts (send in sequence -
all the other clients wait until the send finish) it works fine. is that
exclude the race condition?


Regards,
Amr




>>We've seen similar things in our code. In our case it is probably due to a

>>race condition


>>On Feb 24, 2010 9:36 PM, "Amr Hassan" 
wrote:

>>Hi All,

>>I'm ...

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] libmpi.so.0 ERROR

2010-02-25 Thread Jeff Squyres
This typically means that either libmpi.so does not exist on the machine that 
you are trying to run it on, or it cannot be found.  You may need to extend the 
value of the LD_LIBRARY_PATH environment variable with the lib directory of 
your Open MPI installation (don't just overwrite it -- check first to see if 
there's something else already in there, and if so, prefix it).

For example (Bourne-flavored shells):

$ LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH

For C-flavored shells:

% setenv LD_LIBRARY_PATH /opt/openmpi/lib:$LD_LIBRARY_PATH

(both of these assume that there is already a value in LD_LIBRARY_PATH)

You might be able to configure Open MPI with the 
--enable-mpirun-prefix-by-default configure switch, which, if you have Open MPI 
installed in the same location on all nodes, takes away some of this drudgery 
by prefixing LD_LIBRARY_PATH and PATH for you.



On Feb 25, 2010, at 4:33 AM, Abhra Paul wrote:

> Respected Users
> 
> I have installed openmpi successfully also compiled the hello_world program 
> with mpicc. But when I am running the exectable with the command mpirun -np 2 
> hello_mpi(hello_mpi is the executable) in my desktop pc(dual-core processor) 
> is giving an error like this:
> 
> hello_mpi: error while loading shared libraries: libmpi.so.0: cannot open 
> shared object file: No such file or directory
> hello_mpi: error while loading shared libraries: libmpi.so.0: cannot open 
> shared object file: No such file or directory
> 
> Please suggest me what I have to do to solve it.
> 
> Regards
> Abhra Paul
> 
> 
>   The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. 
> http://in.yahoo.com/
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] libmpi.so.0 ERROR

2010-02-25 Thread Abhra Paul
Respected Users

I have installed openmpi successfully also compiled the hello_world program 
with mpicc. But when I am running the exectable with the command mpirun -np 2 
hello_mpi(hello_mpi is the executable) in my desktop pc(dual-core processor) is 
giving an error like this:

hello_mpi: error while loading shared libraries: libmpi.so.0: cannot open 
shared object file: No such file or directory
hello_mpi: error while loading shared libraries: libmpi.so.0: cannot open 
shared object file: No such file or directory

Please suggest me what I have to do to solve it.

Regards
Abhra Paul


  The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. 
http://in.yahoo.com/



Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Amr Hassan
Thanks alot for your reply,

I'm using blocking Send and Receive. All the clients are sending data and
the server is receive the messages from the clients with MPI_ANY_SOURCE as
the sender. Do you think there is a race condition near this pattern?

I searched a lot and used totalview but I couldn't detect such case. I
really appreciate if you send me a link or give an example of a possible
race condition in that scenario .

Also, when I partition the message into smaller parts (send in sequence -
all the other clients wait until the send finish) it works fine. is that
exclude the race condition?


Regards,
Amr


>>We've seen similar things in our code. In our case it is probably due to a

>>race condition. Try running the segv'ing process in a debugger, and it
will
>>likely show you a bug in your code

>>On Feb 24, 2010 9:36 PM, "Amr Hassan" 
wrote:

>>Hi All,

>>I'm facing a strange problem with OpenMPI.

>>I'm developing an application which is required to send a message from
each
>>client (1 MB each) to a server node for around 10 times per second (it's a

>>distributed render application and I'm trying to reach a higher frame rate

>>). The problem is that OpenMPI crash in that case and only works if I
>>partition this messages into a set of 20 k sub-messages with a sleep
between
>>each one of them for around 1 to 10 ms!! This solution is very expensive
in
>>term of time needed to send the data. Is there any other solutions?

>>The error i got now is:
>>Signal: Segmentation fault (11)
>>Signal code: Address not mapped (1)
>>Failing at address: x

>>The OS is Linux CentOS. I'm using the latest version of OpenMPI.

>>I appreciate any help regarding that.

 >>Regards,
>>Amr


Re: [OMPI users] problems on parallel writing

2010-02-25 Thread jody
Hi
Just wanted to let you know:

I translated your program to C ran it, and it crashed at MPI_FILE_SET_VIEW
in a similar way than yours did.
then i added an if-clause to prevent the call of MPI_FILE_WRITE with
the undefined value.
if (myid == 0) {
MPI_File_write(fh, temp, count, MPI_DOUBLE, );
}
After this it ran without crash.
However, the output is not what you expected:
The number 2122010.0 was not there - probably overwritten by the
MPI_FILE_WRITE_ALL.
But this was fixed by replacing the line
  disp=0
by
  disp=8
and removing the
  if (single_no .gt. 0) map = map + 1
statement.

So here's what all looks like:
===
program test_MPI_write_adv2


  !-- Template for any mpi program

  implicit none

  !--Include the mpi header file
  include 'mpif.h'  ! --> Required statement

  !--Declare all variables and arrays.
  integer :: fh, ierr, myid, numprocs, itag, etype, filetype, info
  integer :: status(MPI_STATUS_SIZE)
  integer :: irc, ip
  integer(kind=mpi_offset_kind) :: offset, disp
  integer :: i, j, k

  integer :: num

  character(len=64) :: filename

  real(8), pointer :: q(:), temp(:)
  integer, pointer :: map(:)
  integer :: single_no, count


  !--Initialize MPI
  call MPI_INIT( ierr ) ! --> Required statement

  !--Who am I? --- get my rank=myid
  call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )

  !--How many processes in the global group?
  call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )

  if ( myid == 0 ) then
 single_no = 4
  elseif ( myid == 1 ) then
 single_no = 2
  elseif ( myid == 2 ) then
 single_no = 2
  elseif ( myid == 3 ) then
 single_no = 3
  else
 single_no = 0
  end if

  if (single_no .gt. 0) allocate(map(single_no))

  if ( myid == 0 ) then
 map = (/ 0, 2, 5, 6 /)
  elseif ( myid == 1 ) then
 map = (/ 1, 4 /)
  elseif ( myid == 2 ) then
 map = (/ 3, 9 /)
  elseif ( myid == 3 ) then
 map = (/ 7, 8, 10 /)
  end if

  if (single_no .gt. 0) allocate(q(single_no))

  if (single_no .gt. 0) then
 do i = 1,single_no
q(i) = dble(myid+1)*100.0d0 + dble(map(i)+1)
 end do
  end if


  if ( myid == 0 ) then
 count = 1
  else
 count = 0
  end if

  if (count .gt. 0) then
 allocate(temp(count))
 temp(1) = 2122010.0d0
  end if

  write(filename,'(a)') 'test_write.bin'

  call MPI_FILE_OPEN(MPI_COMM_WORLD, filename,
MPI_MODE_RDWR+MPI_MODE_CREATE, MPI_INFO_NULL, fh, ierr)

  if (my_id == 0) then
call MPI_FILE_WRITE(FH, temp, COUNT, MPI_REAL8, STATUS, IERR)
  endif

  call MPI_TYPE_CREATE_INDEXED_BLOCK(single_no, 1, map,
MPI_DOUBLE_PRECISION, filetype, ierr)
  call MPI_TYPE_COMMIT(filetype, ierr)
  disp = 8  ! ---> size of MPI_REAL8 (number written when my_id = 0)
  call MPI_FILE_SET_VIEW(fh, disp, MPI_DOUBLE_PRECISION, filetype,
'native', MPI_INFO_NULL, ierr)
  call MPI_FILE_WRITE_ALL(fh, q, single_no, MPI_DOUBLE_PRECISION, status, ierr)
  call MPI_FILE_CLOSE(fh, ierr)


  if (single_no .gt. 0) deallocate(map)

  if (single_no .gt. 0) deallocate(q)

  if (count .gt. 0) deallocate(temp)

  !--Finilize MPI
  call MPI_FINALIZE(irc)! ---> Required statement

  stop


end program test_MPI_write_adv2
===

Regards
  jody

On Thu, Feb 25, 2010 at 2:47 AM, Terry Frankcombe  wrote:
> On Wed, 2010-02-24 at 13:40 -0500, w k wrote:
>> Hi Jordy,
>>
>> I don't think this part caused the problem. For fortran, it doesn't
>> matter if the pointer is NULL as long as the count requested from the
>> processor is 0. Actually I tested the code and it passed this part
>> without problem. I believe it aborted at MPI_FILE_SET_VIEW part.
>>
>
> For the record:  A pointer is not NULL unless you've nullified it.
> IIRC, the Fortran standard says that any non-assigning reference to an
> unassigned, unnullified pointer is undefined (or maybe illegal... check
> the standard).
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Brian Budge
We've seen similar things in our code.  In our case it is probably due to a
race condition.  Try running the segv'ing process in a debugger, and it will
likely show you a bug in your code

On Feb 24, 2010 9:36 PM, "Amr Hassan"  wrote:

Hi All,

I'm facing a strange problem with OpenMPI.

I'm developing an application which is required to send a message from each
client  (1 MB each) to a server node for around 10 times per second (it's a
distributed render application and I'm trying to reach a higher frame rate
). The problem is that OpenMPI crash in that case and only works if I
partition this messages into a set of 20 k sub-messages with a sleep between
each one of them for around 1 to 10 ms!! This solution is very expensive in
term of time needed to send the data.  Is there any other solutions?

The error i got now is:
Signal: Segmentation fault (11)
Signal code:  Address not mapped (1)
Failing at address: x

The OS is Linux CentOS.  I'm using the latest version of OpenMPI.

I appreciate any help regarding that.

 Regards,
Amr

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Amr Hassan
Hi All,

I'm facing a strange problem with OpenMPI.

I'm developing an application which is required to send a message from each
client  (1 MB each) to a server node for around 10 times per second (it's a
distributed render application and I'm trying to reach a higher frame rate
). The problem is that OpenMPI crash in that case and only works if I
partition this messages into a set of 20 k sub-messages with a sleep between
each one of them for around 1 to 10 ms!! This solution is very expensive in
term of time needed to send the data.  Is there any other solutions?

The error i got now is:
Signal: Segmentation fault (11)
Signal code:  Address not mapped (1)
Failing at address: x

The OS is Linux CentOS.  I'm using the latest version of OpenMPI.

I appreciate any help regarding that.

 Regards,
Amr