Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Amr Hassan
Hi All,

Thanks a lot for your support. It was a big help. I found a race condition
in my code and now the problem is solved.

Regards,
Amr

On Fri, Feb 26, 2010 at 7:45 AM, Prentice Bisbal  wrote:

> Amr Hassan wrote:
> > Thanks alot for your reply,
> >
> > I'm using blocking Send and Receive. All the clients are sending data
> > and the server is receive the messages from the clients with
> > MPI_ANY_SOURCE as the sender. Do you think there is a race condition
> > near this pattern?
> >
> > I searched a lot and used totalview but I couldn't detect such case. I
> > really appreciate if you send me a link or give an example of a possible
> > race condition in that scenario .
> >
> > Also, when I partition the message into smaller parts (send in sequence
> > - all the other clients wait until the send finish) it works fine. is
> > that exclude the race condition?
> >
>
> It sounds like, when sending the large messages, you are putting more
> data into a buffer than it can hold. When you break the messages up into
>  smaller sizes, you're not overflowing the buffer.
>
> Are you using MPI_Pack, by any chance?
>
> --
> Prentice Bisbal
> Linux Software Support Specialist/System Administrator
> School of Natural Sciences
> Institute for Advanced Study
> Princeton, NJ
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Prentice Bisbal
Amr Hassan wrote:
> Thanks alot for your reply,
>  
> I'm using blocking Send and Receive. All the clients are sending data
> and the server is receive the messages from the clients with
> MPI_ANY_SOURCE as the sender. Do you think there is a race condition
> near this pattern? 
>  
> I searched a lot and used totalview but I couldn't detect such case. I
> really appreciate if you send me a link or give an example of a possible
> race condition in that scenario . 
>  
> Also, when I partition the message into smaller parts (send in sequence
> - all the other clients wait until the send finish) it works fine. is
> that exclude the race condition?
>  

It sounds like, when sending the large messages, you are putting more
data into a buffer than it can hold. When you break the messages up into
 smaller sizes, you're not overflowing the buffer.

Are you using MPI_Pack, by any chance?

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Prentice Bisbal
I was getting the same error a few weeks ago. In my case the error
message was spot on. I was trying to put too much data in a buffer using
MPI_Pack.

I was able to track down the problem using valgrind. Have you tried that
yet? You need to install valgrind first and then compile OpenMPI with
valgrind support. It takes some time, but is worth it.

http://www.open-mpi.org/faq/?category=debugging#memchecker_what

Amr Hassan wrote:
> Hi All,
> 
> I'm facing a strange problem with OpenMPI.
> 
> I'm developing an application which is required to send a message from
> each client  (1 MB each) to a server node for around 10 times per second
> (it's a distributed render application and I'm trying to reach a higher
> frame rate ). The problem is that OpenMPI crash in that case and only
> works if I partition this messages into a set of 20 k sub-messages with
> a sleep between each one of them for around 1 to 10 ms!! This solution
> is very expensive in term of time needed to send the data.  Is there any
> other solutions?
> 
> The error i got now is:
> Signal: Segmentation fault (11)
> Signal code:  Address not mapped (1)
> Failing at address: x
> 
> The OS is Linux CentOS.  I'm using the latest version of OpenMPI.
> 
> I appreciate any help regarding that.
> 
>  Regards,
> Amr
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Amr Hassan
Yes but only one thread at each client is allowed to use MPI. Also, there is
a semaphore on the MPI usage.



On Fri, Feb 26, 2010 at 1:09 AM, Brian Budge  wrote:

> Is your code multithreaded?
>
> On Feb 25, 2010 12:56 AM, "Amr Hassan"  wrote:
>
> Thanks alot for your reply,
>
> I'm using blocking Send and Receive. All the clients are sending data and
> the server is receive the messages from the clients with MPI_ANY_SOURCE as
> the sender. Do you think there is a race condition near this pattern?
>
> I searched a lot and used totalview but I couldn't detect such case. I
> really appreciate if you send me a link or give an example of a possible
> race condition in that scenario .
>
> Also, when I partition the message into smaller parts (send in sequence -
> all the other clients wait until the send finish) it works fine. is that
> exclude the race condition?
>
>
> Regards,
> Amr
>
>
>
>
> >>We've seen similar things in our code. In our case it is probably due to
> a
> >>race condition
>
>
> >>On Feb 24, 2010 9:36 PM, "Amr Hassan" 
> wrote:
>
> >>Hi All,
>
> >>I'm ...
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Brian Budge
Is your code multithreaded?

On Feb 25, 2010 12:56 AM, "Amr Hassan"  wrote:

Thanks alot for your reply,

I'm using blocking Send and Receive. All the clients are sending data and
the server is receive the messages from the clients with MPI_ANY_SOURCE as
the sender. Do you think there is a race condition near this pattern?

I searched a lot and used totalview but I couldn't detect such case. I
really appreciate if you send me a link or give an example of a possible
race condition in that scenario .

Also, when I partition the message into smaller parts (send in sequence -
all the other clients wait until the send finish) it works fine. is that
exclude the race condition?


Regards,
Amr




>>We've seen similar things in our code. In our case it is probably due to a

>>race condition


>>On Feb 24, 2010 9:36 PM, "Amr Hassan" 
wrote:

>>Hi All,

>>I'm ...

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Amr Hassan
Thanks alot for your reply,

I'm using blocking Send and Receive. All the clients are sending data and
the server is receive the messages from the clients with MPI_ANY_SOURCE as
the sender. Do you think there is a race condition near this pattern?

I searched a lot and used totalview but I couldn't detect such case. I
really appreciate if you send me a link or give an example of a possible
race condition in that scenario .

Also, when I partition the message into smaller parts (send in sequence -
all the other clients wait until the send finish) it works fine. is that
exclude the race condition?


Regards,
Amr


>>We've seen similar things in our code. In our case it is probably due to a

>>race condition. Try running the segv'ing process in a debugger, and it
will
>>likely show you a bug in your code

>>On Feb 24, 2010 9:36 PM, "Amr Hassan" 
wrote:

>>Hi All,

>>I'm facing a strange problem with OpenMPI.

>>I'm developing an application which is required to send a message from
each
>>client (1 MB each) to a server node for around 10 times per second (it's a

>>distributed render application and I'm trying to reach a higher frame rate

>>). The problem is that OpenMPI crash in that case and only works if I
>>partition this messages into a set of 20 k sub-messages with a sleep
between
>>each one of them for around 1 to 10 ms!! This solution is very expensive
in
>>term of time needed to send the data. Is there any other solutions?

>>The error i got now is:
>>Signal: Segmentation fault (11)
>>Signal code: Address not mapped (1)
>>Failing at address: x

>>The OS is Linux CentOS. I'm using the latest version of OpenMPI.

>>I appreciate any help regarding that.

 >>Regards,
>>Amr


Re: [OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Brian Budge
We've seen similar things in our code.  In our case it is probably due to a
race condition.  Try running the segv'ing process in a debugger, and it will
likely show you a bug in your code

On Feb 24, 2010 9:36 PM, "Amr Hassan"  wrote:

Hi All,

I'm facing a strange problem with OpenMPI.

I'm developing an application which is required to send a message from each
client  (1 MB each) to a server node for around 10 times per second (it's a
distributed render application and I'm trying to reach a higher frame rate
). The problem is that OpenMPI crash in that case and only works if I
partition this messages into a set of 20 k sub-messages with a sleep between
each one of them for around 1 to 10 ms!! This solution is very expensive in
term of time needed to send the data.  Is there any other solutions?

The error i got now is:
Signal: Segmentation fault (11)
Signal code:  Address not mapped (1)
Failing at address: x

The OS is Linux CentOS.  I'm using the latest version of OpenMPI.

I appreciate any help regarding that.

 Regards,
Amr

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] Sending relatively large messages with high frequency

2010-02-25 Thread Amr Hassan
Hi All,

I'm facing a strange problem with OpenMPI.

I'm developing an application which is required to send a message from each
client  (1 MB each) to a server node for around 10 times per second (it's a
distributed render application and I'm trying to reach a higher frame rate
). The problem is that OpenMPI crash in that case and only works if I
partition this messages into a set of 20 k sub-messages with a sleep between
each one of them for around 1 to 10 ms!! This solution is very expensive in
term of time needed to send the data.  Is there any other solutions?

The error i got now is:
Signal: Segmentation fault (11)
Signal code:  Address not mapped (1)
Failing at address: x

The OS is Linux CentOS.  I'm using the latest version of OpenMPI.

I appreciate any help regarding that.

 Regards,
Amr