Re: [OMPI devel] Nearly unlimited growth of pml free list

2013-10-01 Thread George Bosilca
With a size of 3 doubles all requests are going out in the eager mode. The data 
will be copied into our internal buffers, and the MPI request will be marked as 
complete (this is deep MPI voodoo, I'm just trying to explain the next 
sentence). Thus all sends will look as asynchronous from a user perspective, 
they will happen while the request was returned as completed but before we 
release it internally. Now, if you have millions of such calls, I can imagine a 
way where the driver is overloaded and will start packing up requests, in a way 
that will look like the requests list is growing without limit.

Let's try to see if this is indeed the case:

1. Set the eager for your network to 0 (this will force all messages to go via 
the rdv protocol). For this find out what network you are using (maybe via the 
--mca btl parameter you provided), and set their eager to 0. For example for 
TCP you can use "--mca btl_tcp_eager_limit 0"

2. Alter your code to add a barrier every K recursions (K should be a large 
value like few hundreds). This will provide a means for the network to be 
drained.

3. Are you sure you have no MPI_Isend with a similar size in your code that are 
not correctly completed?

  George.

On Oct 1, 2013, at 09:41 , Max Staufer  wrote:

> George,
> 
> well the code itself runs fine, its just that the ompi send list keeps 
> allocating memory, and I pinpointed it to this single call.
> Probably the root problem is elsewhere, but it appears to me that the entries 
> in the send list are not released for reuse after the
> operation completed.
> 
> The Size of the operation is 3 doubles.
> 
> Max
> 
> Am 01.10.2013 01:40, schrieb George Bosilca:
>> Max,
>> 
>> The recursive call should not be an issue, as the MPI_Allreduce is a 
>> blocking operation, you can't recurse before the previous call completes.
>> 
>> What is the size of the data exchanged in the MPI_Alltoall?
>> 
>> George.
>> 
>> 
>> On Sep 30, 2013, at 17:09 , Max Staufer  wrote:
>> 
>>> Well, havent tryed 1.7.2 yet, but too elaborate the problem a little bit 
>>> more,
>>> 
>>>  the groth happens if we use an MPI_ALLREDUCE in a recursive subroutine 
>>> call, that means in FORTRAN90 speech the
>>> subroutine calls itself again, and  is specially marked in order to work 
>>> properly. Apart from that nothing is special
>>> with this routine. Is it possible that the F77 interface in Openmpi is not 
>>> able to cope with recursions ?
>>> 
>>> MAX
>>> 
>>> 
>>> 
>>> Am 13.09.13 17:18, schrieb Rolf vandeVaart:
>>>> Yes, it appears the send_requests list is the one that is growing.  This 
>>>> list holds the send request structures that are in use.  After a send is 
>>>> completed, a send request is supposed to be returned to this list and then 
>>>> get re-used.
>>>> 
>>>> With 7 processes, it had reached a size of 16,324 send requests in use.  
>>>> With the 8 processes, it had reached a size of 16,708.  Each send request 
>>>> is 720 bytes (in debug build it is 872) and if we do the math we have 
>>>> consumed about 12 Mbytes.
>>>> 
>>>> Setting some type of bound will not fix this issue.  There is something 
>>>> else going on here that is causing this problem.   I know you described 
>>>> the problem earlier on, but maybe you can explain again?  How many 
>>>> processes?  What type of cluster?One other thought is perhaps trying 
>>>> Open MPI 1.7.2 to see if you still see the problem.   Maybe someone else 
>>>> has suggestions too.
>>>> 
>>>> Rolf
>>>> 
>>>> PS: For those who missed a private email, I had Max add some 
>>>> instrumentation so we could see which list was growing.  We now know it is 
>>>> the mca_pml_base_send_requests list.
>>>> 
>>>>> -Original Message-
>>>>> From: Max Staufer [mailto:max.stau...@gmx.net]
>>>>> Sent: Friday, September 13, 2013 7:06 AM
>>>>> To: Rolf vandeVaart;de...@open-mpi.org
>>>>> Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list
>>>>> 
>>>>> Hi Rolf,
>>>>> 
>>>>>I applied your patch, the full output is rather big, even gzip > 10Mb, 
>>>>> which is
>>>>> not good for the mailinglist, but the head and tail are below for a 7 and 
>>>>> 8
>>>>> processor run.
>>>>> Seem that the send requests are growing 

Re: [OMPI devel] Nearly unlimited growth of pml free list

2013-10-01 Thread Max Staufer

George,

 well the code itself runs fine, its just that the ompi send list 
keeps allocating memory, and I pinpointed it to this single call.
Probably the root problem is elsewhere, but it appears to me that the 
entries in the send list are not released for reuse after the

operation completed.

The Size of the operation is 3 doubles.

Max

Am 01.10.2013 01:40, schrieb George Bosilca:

Max,

The recursive call should not be an issue, as the MPI_Allreduce is a blocking 
operation, you can't recurse before the previous call completes.

What is the size of the data exchanged in the MPI_Alltoall?

George.


On Sep 30, 2013, at 17:09 , Max Staufer  wrote:


Well, havent tryed 1.7.2 yet, but too elaborate the problem a little bit more,

  the groth happens if we use an MPI_ALLREDUCE in a recursive subroutine call, 
that means in FORTRAN90 speech the
subroutine calls itself again, and  is specially marked in order to work 
properly. Apart from that nothing is special
with this routine. Is it possible that the F77 interface in Openmpi is not able 
to cope with recursions ?

MAX



Am 13.09.13 17:18, schrieb Rolf vandeVaart:

Yes, it appears the send_requests list is the one that is growing.  This list 
holds the send request structures that are in use.  After a send is completed, 
a send request is supposed to be returned to this list and then get re-used.

With 7 processes, it had reached a size of 16,324 send requests in use.  With 
the 8 processes, it had reached a size of 16,708.  Each send request is 720 
bytes (in debug build it is 872) and if we do the math we have consumed about 
12 Mbytes.

Setting some type of bound will not fix this issue.  There is something else 
going on here that is causing this problem.   I know you described the problem 
earlier on, but maybe you can explain again?  How many processes?  What type of 
cluster?One other thought is perhaps trying Open MPI 1.7.2 to see if you 
still see the problem.   Maybe someone else has suggestions too.

Rolf

PS: For those who missed a private email, I had Max add some instrumentation so 
we could see which list was growing.  We now know it is the 
mca_pml_base_send_requests list.


-Original Message-
From: Max Staufer [mailto:max.stau...@gmx.net]
Sent: Friday, September 13, 2013 7:06 AM
To: Rolf vandeVaart;de...@open-mpi.org
Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list

Hi Rolf,

I applied your patch, the full output is rather big, even gzip > 10Mb, 
which is
not good for the mailinglist, but the head and tail are below for a 7 and 8
processor run.
Seem that the send requests are growing fast 4000 times in just 10 min.

Do you now of a method to bound the list such that it is not growing excessivly
?

thanks

Max

7 Processor run
--
[gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236]
Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-
env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev

Re: [OMPI devel] Nearly unlimited growth of pml free list

2013-09-30 Thread George Bosilca
Max,

The recursive call should not be an issue, as the MPI_Allreduce is a blocking 
operation, you can't recurse before the previous call completes.

What is the size of the data exchanged in the MPI_Alltoall?

George.


On Sep 30, 2013, at 17:09 , Max Staufer  wrote:

> Well, havent tryed 1.7.2 yet, but too elaborate the problem a little bit more,
> 
>  the groth happens if we use an MPI_ALLREDUCE in a recursive subroutine call, 
> that means in FORTRAN90 speech the
> subroutine calls itself again, and  is specially marked in order to work 
> properly. Apart from that nothing is special
> with this routine. Is it possible that the F77 interface in Openmpi is not 
> able to cope with recursions ?
> 
> MAX
> 
> 
> 
> Am 13.09.13 17:18, schrieb Rolf vandeVaart:
>> Yes, it appears the send_requests list is the one that is growing.  This 
>> list holds the send request structures that are in use.  After a send is 
>> completed, a send request is supposed to be returned to this list and then 
>> get re-used.
>> 
>> With 7 processes, it had reached a size of 16,324 send requests in use.  
>> With the 8 processes, it had reached a size of 16,708.  Each send request is 
>> 720 bytes (in debug build it is 872) and if we do the math we have consumed 
>> about 12 Mbytes.
>> 
>> Setting some type of bound will not fix this issue.  There is something else 
>> going on here that is causing this problem.   I know you described the 
>> problem earlier on, but maybe you can explain again?  How many processes?  
>> What type of cluster?One other thought is perhaps trying Open MPI 1.7.2 
>> to see if you still see the problem.   Maybe someone else has suggestions 
>> too.
>> 
>> Rolf
>> 
>> PS: For those who missed a private email, I had Max add some instrumentation 
>> so we could see which list was growing.  We now know it is the 
>> mca_pml_base_send_requests list.
>> 
>>> -----Original Message-
>>> From: Max Staufer [mailto:max.stau...@gmx.net]
>>> Sent: Friday, September 13, 2013 7:06 AM
>>> To: Rolf vandeVaart;de...@open-mpi.org
>>> Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list
>>> 
>>> Hi Rolf,
>>> 
>>>I applied your patch, the full output is rather big, even gzip > 10Mb, 
>>> which is
>>> not good for the mailinglist, but the head and tail are below for a 7 and 8
>>> processor run.
>>> Seem that the send requests are growing fast 4000 times in just 10 min.
>>> 
>>> Do you now of a method to bound the list such that it is not growing 
>>> excessivly
>>> ?
>>> 
>>> thanks
>>> 
>>> Max
>>> 
>>> 7 Processor run
>>> --
>>> [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236]
>>> Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
>>> Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
>>> Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-
>>> env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>>> maxAlloc=-1
>>> [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>>> 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>>> maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>>> recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>>> env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>>> [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>>> [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>>> [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>>> 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>>> maxAlloc=-1
>>> [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>>> 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>>> maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>>> recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>>> env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>>> [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>>> [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>>> [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>>> 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>>> maxAlloc=-1
>>> [gpu207.dev-env.lan:11

Re: [OMPI devel] Nearly unlimited growth of pml free list

2013-09-30 Thread Max Staufer
Well, havent tryed 1.7.2 yet, but too elaborate the problem a little bit 
more,


  the groth happens if we use an MPI_ALLREDUCE in a recursive 
subroutine call, that means in FORTRAN90 speech the
subroutine calls itself again, and  is specially marked in order to work 
properly. Apart from that nothing is special
with this routine. Is it possible that the F77 interface in Openmpi is 
not able to cope with recursions ?


MAX



Am 13.09.13 17:18, schrieb Rolf vandeVaart:

Yes, it appears the send_requests list is the one that is growing.  This list 
holds the send request structures that are in use.  After a send is completed, 
a send request is supposed to be returned to this list and then get re-used.

With 7 processes, it had reached a size of 16,324 send requests in use.  With 
the 8 processes, it had reached a size of 16,708.  Each send request is 720 
bytes (in debug build it is 872) and if we do the math we have consumed about 
12 Mbytes.

Setting some type of bound will not fix this issue.  There is something else 
going on here that is causing this problem.   I know you described the problem 
earlier on, but maybe you can explain again?  How many processes?  What type of 
cluster?One other thought is perhaps trying Open MPI 1.7.2 to see if you 
still see the problem.   Maybe someone else has suggestions too.

Rolf

PS: For those who missed a private email, I had Max add some instrumentation so 
we could see which list was growing.  We now know it is the 
mca_pml_base_send_requests list.


-Original Message-
From: Max Staufer [mailto:max.stau...@gmx.net]
Sent: Friday, September 13, 2013 7:06 AM
To: Rolf vandeVaart;de...@open-mpi.org
Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list

Hi Rolf,

I applied your patch, the full output is rather big, even gzip > 10Mb, 
which is
not good for the mailinglist, but the head and tail are below for a 7 and 8
processor run.
Seem that the send requests are growing fast 4000 times in just 10 min.

Do you now of a method to bound the list such that it is not growing excessivly
?

thanks

Max

7 Processor run
--
[gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236]
Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-
env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pc

Re: [OMPI devel] Nearly unlimited growth of pml free list

2013-09-13 Thread Rolf vandeVaart
Yes, it appears the send_requests list is the one that is growing.  This list 
holds the send request structures that are in use.  After a send is completed, 
a send request is supposed to be returned to this list and then get re-used.

With 7 processes, it had reached a size of 16,324 send requests in use.  With 
the 8 processes, it had reached a size of 16,708.  Each send request is 720 
bytes (in debug build it is 872) and if we do the math we have consumed about 
12 Mbytes.

Setting some type of bound will not fix this issue.  There is something else 
going on here that is causing this problem.   I know you described the problem 
earlier on, but maybe you can explain again?  How many processes?  What type of 
cluster?One other thought is perhaps trying Open MPI 1.7.2 to see if you 
still see the problem.   Maybe someone else has suggestions too.

Rolf

PS: For those who missed a private email, I had Max add some instrumentation so 
we could see which list was growing.  We now know it is the 
mca_pml_base_send_requests list.

>-Original Message-
>From: Max Staufer [mailto:max.stau...@gmx.net]
>Sent: Friday, September 13, 2013 7:06 AM
>To: Rolf vandeVaart; de...@open-mpi.org
>Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list
>
>Hi Rolf,
>
>I applied your patch, the full output is rather big, even gzip > 10Mb, 
> which is
>not good for the mailinglist, but the head and tail are below for a 7 and 8
>processor run.
>Seem that the send requests are growing fast 4000 times in just 10 min.
>
>Do you now of a method to bound the list such that it is not growing excessivly
>?
>
>thanks
>
>Max
>
>7 Processor run
>--
>[gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236]
>Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
>Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236]
>Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-
>env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
>recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-
>env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping
>[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
>maxAlloc=-1
>[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-
>1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4,
>maxAlloc=-1 [gp

Re: [OMPI devel] Nearly unlimited growth of pml free list

2013-09-13 Thread Max Staufer
ist=send_ranges_pckts, numAlloc=4, 
maxAlloc=-1

[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0, 
recv_pending=0, send_pending=0, comm_pending=0


...

[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, 
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, 
maxAlloc=-1

[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, 
recv_pending=0, send_pending=0, comm_pending=0

[gpu207.dev-env.lan:11322]
[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, 
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, 
maxAlloc=-1

[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, 
recv_pending=0, send_pending=0, comm_pending=0

[gpu207.dev-env.lan:11322]
[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, 
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, 
maxAlloc=-1

[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, 
recv_pending=0, send_pending=0, comm_pending=0

[gpu207.dev-env.lan:11322]
[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, 
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, 
maxAlloc=-1

[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, 
recv_pending=0, send_pending=0, comm_pending=0

[gpu207.dev-env.lan:11322]
[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, 
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, 
maxAlloc=-1

[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, 
recv_pending=0, send_pending=0, comm_pending=0




Am 12.09.2013 17:04, schrieb Rolf vandeVaart:

Can you apply this patch and try again?  It will print out the sizes of the 
free lists after every 100 calls into the mca_pml_ob1_send.  It would be 
interesting to see which one is growing.
This might give us some clues.

Rolf


-Original Message-
From: Max Staufer [mailto:max.stau...@gmx.net]
Sent: Thursday, September 12, 2013 3:53 AM
To: Rolf vandeVaart
Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list

Hi Rolf,

the heap snapshots I do tell me where and when the memory has been
allocated, and a simple source trace of the in tells me that the calling

routine was mca_pml_ob1_send  and that all of the ~10 single allocations
during the run were called because of an MPI_ALLREDUCE command called in
exactly one place of the code.
The tool I use for doing that is MemorySCAPE but I thing Valgrind can tell you
the same thing. However, I was not able to reproduce the problem in a
simpler program yet, but I suspect it has something to do with the locking
mechanism of the list elements. I dont know enough about OMPI to comment
on that, but it looks like that the list is growing because all elements are
locked.

really any help is appreciated

Max

PS:

IF I MIMICK ALLREDUCE with 2*Nproc SEND and RECV commands (aggregating
on proc 0 and then sending out to all Proc) I get the same kind of behaviour.

Am 11.09.2013 17:12, schrieb Rolf vandeVaart:

Hi Max:
You say that that the function keeps "allocating memory in the 

Re: [OMPI devel] Nearly unlimited growth of pml free list

2013-09-11 Thread Rolf vandeVaart
Hi Max:
You say that that the function keeps "allocating memory in the pml free list."  
How do you know that is happening? 
Do you know which free list it is happening on?  There are something like 8 
free lists associated with the pml ob1 so it would be interesting to know which 
one you observe is growing.

Rolf 

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Max Staufer
>Sent: Wednesday, September 11, 2013 10:23 AM
>To: de...@open-mpi.org
>Subject: [OMPI devel] Nearly unlimited growth of pml free list
>
>Hi All,
>
> as I already asked in the users list, I was told thats not the right 
> place to ask,
>I came across a "missbehaviour" of openmpi version 1.4.5 and 1.6.5 alike.
>
>the mca_pml_ob1_send function keeps allocating memory in the pml free list.
>It does that indefinitly. In my case the list grew to about 100Gb.
>
>I can controll the maximum using the pml_ob1_free_list_max parameter, but
>then the application just stops working when this number of entries in the list
>is reached.
>
>The interesting part is that the growth only happens in a single place in the
>code, which is RECURSIVE SUBROUTINE.
>
>And the called function is an MPI_ALLREDUCE(... MPI_SUM)
>
>Apparently its not easy to create a test program that shows the same
>behaviour, just recursion is not enought.
>
>Is there a mca parameter that allows to limit the total list size without 
>making
>the app. stop ?
>
>or is there a way to enforce the lock on the free list entries ?
>
>Thanks for all the help
>
>Max
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


[OMPI devel] Nearly unlimited growth of pml free list

2013-09-11 Thread Max Staufer

Hi All,

as I already asked in the users list, I was told thats not the 
right place to ask, I came across a "missbehaviour" of openmpi

version 1.4.5 and 1.6.5 alike.

the mca_pml_ob1_send function keeps allocating memory in the pml free 
list. It does that indefinitly. In my case the list

grew to about 100Gb.

I can controll the maximum using the pml_ob1_free_list_max parameter, 
but then the application just stops working when this number of entries 
in the list is reached.


The interesting part is that the growth only happens in a single place 
in the code, which is RECURSIVE SUBROUTINE.


And the called function is an MPI_ALLREDUCE(... MPI_SUM)

Apparently its not easy to create a test program that shows the same 
behaviour, just recursion is not enought.


Is there a mca parameter that allows to limit the total list size 
without making the app. stop ?


or is there a way to enforce the lock on the free list entries ?

Thanks for all the help

Max