Re: [OMPI devel] Nearly unlimited growth of pml free list
With a size of 3 doubles all requests are going out in the eager mode. The data will be copied into our internal buffers, and the MPI request will be marked as complete (this is deep MPI voodoo, I'm just trying to explain the next sentence). Thus all sends will look as asynchronous from a user perspective, they will happen while the request was returned as completed but before we release it internally. Now, if you have millions of such calls, I can imagine a way where the driver is overloaded and will start packing up requests, in a way that will look like the requests list is growing without limit. Let's try to see if this is indeed the case: 1. Set the eager for your network to 0 (this will force all messages to go via the rdv protocol). For this find out what network you are using (maybe via the --mca btl parameter you provided), and set their eager to 0. For example for TCP you can use "--mca btl_tcp_eager_limit 0" 2. Alter your code to add a barrier every K recursions (K should be a large value like few hundreds). This will provide a means for the network to be drained. 3. Are you sure you have no MPI_Isend with a similar size in your code that are not correctly completed? George. On Oct 1, 2013, at 09:41 , Max Staufer wrote: > George, > > well the code itself runs fine, its just that the ompi send list keeps > allocating memory, and I pinpointed it to this single call. > Probably the root problem is elsewhere, but it appears to me that the entries > in the send list are not released for reuse after the > operation completed. > > The Size of the operation is 3 doubles. > > Max > > Am 01.10.2013 01:40, schrieb George Bosilca: >> Max, >> >> The recursive call should not be an issue, as the MPI_Allreduce is a >> blocking operation, you can't recurse before the previous call completes. >> >> What is the size of the data exchanged in the MPI_Alltoall? >> >> George. >> >> >> On Sep 30, 2013, at 17:09 , Max Staufer wrote: >> >>> Well, havent tryed 1.7.2 yet, but too elaborate the problem a little bit >>> more, >>> >>> the groth happens if we use an MPI_ALLREDUCE in a recursive subroutine >>> call, that means in FORTRAN90 speech the >>> subroutine calls itself again, and is specially marked in order to work >>> properly. Apart from that nothing is special >>> with this routine. Is it possible that the F77 interface in Openmpi is not >>> able to cope with recursions ? >>> >>> MAX >>> >>> >>> >>> Am 13.09.13 17:18, schrieb Rolf vandeVaart: >>>> Yes, it appears the send_requests list is the one that is growing. This >>>> list holds the send request structures that are in use. After a send is >>>> completed, a send request is supposed to be returned to this list and then >>>> get re-used. >>>> >>>> With 7 processes, it had reached a size of 16,324 send requests in use. >>>> With the 8 processes, it had reached a size of 16,708. Each send request >>>> is 720 bytes (in debug build it is 872) and if we do the math we have >>>> consumed about 12 Mbytes. >>>> >>>> Setting some type of bound will not fix this issue. There is something >>>> else going on here that is causing this problem. I know you described >>>> the problem earlier on, but maybe you can explain again? How many >>>> processes? What type of cluster?One other thought is perhaps trying >>>> Open MPI 1.7.2 to see if you still see the problem. Maybe someone else >>>> has suggestions too. >>>> >>>> Rolf >>>> >>>> PS: For those who missed a private email, I had Max add some >>>> instrumentation so we could see which list was growing. We now know it is >>>> the mca_pml_base_send_requests list. >>>> >>>>> -Original Message- >>>>> From: Max Staufer [mailto:max.stau...@gmx.net] >>>>> Sent: Friday, September 13, 2013 7:06 AM >>>>> To: Rolf vandeVaart;de...@open-mpi.org >>>>> Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list >>>>> >>>>> Hi Rolf, >>>>> >>>>>I applied your patch, the full output is rather big, even gzip > 10Mb, >>>>> which is >>>>> not good for the mailinglist, but the head and tail are below for a 7 and >>>>> 8 >>>>> processor run. >>>>> Seem that the send requests are growing
Re: [OMPI devel] Nearly unlimited growth of pml free list
George, well the code itself runs fine, its just that the ompi send list keeps allocating memory, and I pinpointed it to this single call. Probably the root problem is elsewhere, but it appears to me that the entries in the send list are not released for reuse after the operation completed. The Size of the operation is 3 doubles. Max Am 01.10.2013 01:40, schrieb George Bosilca: Max, The recursive call should not be an issue, as the MPI_Allreduce is a blocking operation, you can't recurse before the previous call completes. What is the size of the data exchanged in the MPI_Alltoall? George. On Sep 30, 2013, at 17:09 , Max Staufer wrote: Well, havent tryed 1.7.2 yet, but too elaborate the problem a little bit more, the groth happens if we use an MPI_ALLREDUCE in a recursive subroutine call, that means in FORTRAN90 speech the subroutine calls itself again, and is specially marked in order to work properly. Apart from that nothing is special with this routine. Is it possible that the F77 interface in Openmpi is not able to cope with recursions ? MAX Am 13.09.13 17:18, schrieb Rolf vandeVaart: Yes, it appears the send_requests list is the one that is growing. This list holds the send request structures that are in use. After a send is completed, a send request is supposed to be returned to this list and then get re-used. With 7 processes, it had reached a size of 16,324 send requests in use. With the 8 processes, it had reached a size of 16,708. Each send request is 720 bytes (in debug build it is 872) and if we do the math we have consumed about 12 Mbytes. Setting some type of bound will not fix this issue. There is something else going on here that is causing this problem. I know you described the problem earlier on, but maybe you can explain again? How many processes? What type of cluster?One other thought is perhaps trying Open MPI 1.7.2 to see if you still see the problem. Maybe someone else has suggestions too. Rolf PS: For those who missed a private email, I had Max add some instrumentation so we could see which list was growing. We now know it is the mca_pml_base_send_requests list. -Original Message- From: Max Staufer [mailto:max.stau...@gmx.net] Sent: Friday, September 13, 2013 7:06 AM To: Rolf vandeVaart;de...@open-mpi.org Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list Hi Rolf, I applied your patch, the full output is rather big, even gzip > 10Mb, which is not good for the mailinglist, but the head and tail are below for a 7 and 8 processor run. Seem that the send requests are growing fast 4000 times in just 10 min. Do you now of a method to bound the list such that it is not growing excessivly ? thanks Max 7 Processor run -- [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev- env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- 1 [gpu207.dev
Re: [OMPI devel] Nearly unlimited growth of pml free list
Max, The recursive call should not be an issue, as the MPI_Allreduce is a blocking operation, you can't recurse before the previous call completes. What is the size of the data exchanged in the MPI_Alltoall? George. On Sep 30, 2013, at 17:09 , Max Staufer wrote: > Well, havent tryed 1.7.2 yet, but too elaborate the problem a little bit more, > > the groth happens if we use an MPI_ALLREDUCE in a recursive subroutine call, > that means in FORTRAN90 speech the > subroutine calls itself again, and is specially marked in order to work > properly. Apart from that nothing is special > with this routine. Is it possible that the F77 interface in Openmpi is not > able to cope with recursions ? > > MAX > > > > Am 13.09.13 17:18, schrieb Rolf vandeVaart: >> Yes, it appears the send_requests list is the one that is growing. This >> list holds the send request structures that are in use. After a send is >> completed, a send request is supposed to be returned to this list and then >> get re-used. >> >> With 7 processes, it had reached a size of 16,324 send requests in use. >> With the 8 processes, it had reached a size of 16,708. Each send request is >> 720 bytes (in debug build it is 872) and if we do the math we have consumed >> about 12 Mbytes. >> >> Setting some type of bound will not fix this issue. There is something else >> going on here that is causing this problem. I know you described the >> problem earlier on, but maybe you can explain again? How many processes? >> What type of cluster?One other thought is perhaps trying Open MPI 1.7.2 >> to see if you still see the problem. Maybe someone else has suggestions >> too. >> >> Rolf >> >> PS: For those who missed a private email, I had Max add some instrumentation >> so we could see which list was growing. We now know it is the >> mca_pml_base_send_requests list. >> >>> -----Original Message- >>> From: Max Staufer [mailto:max.stau...@gmx.net] >>> Sent: Friday, September 13, 2013 7:06 AM >>> To: Rolf vandeVaart;de...@open-mpi.org >>> Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list >>> >>> Hi Rolf, >>> >>>I applied your patch, the full output is rather big, even gzip > 10Mb, >>> which is >>> not good for the mailinglist, but the head and tail are below for a 7 and 8 >>> processor run. >>> Seem that the send requests are growing fast 4000 times in just 10 min. >>> >>> Do you now of a method to bound the list such that it is not growing >>> excessivly >>> ? >>> >>> thanks >>> >>> Max >>> >>> 7 Processor run >>> -- >>> [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] >>> Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] >>> Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] >>> Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev- >>> env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, >>> maxAlloc=-1 >>> [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- >>> 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, >>> maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, >>> recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- >>> env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping >>> [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 >>> [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 >>> [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- >>> 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, >>> maxAlloc=-1 >>> [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- >>> 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, >>> maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, >>> recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- >>> env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping >>> [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 >>> [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 >>> [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- >>> 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, >>> maxAlloc=-1 >>> [gpu207.dev-env.lan:11
Re: [OMPI devel] Nearly unlimited growth of pml free list
Well, havent tryed 1.7.2 yet, but too elaborate the problem a little bit more, the groth happens if we use an MPI_ALLREDUCE in a recursive subroutine call, that means in FORTRAN90 speech the subroutine calls itself again, and is specially marked in order to work properly. Apart from that nothing is special with this routine. Is it possible that the F77 interface in Openmpi is not able to cope with recursions ? MAX Am 13.09.13 17:18, schrieb Rolf vandeVaart: Yes, it appears the send_requests list is the one that is growing. This list holds the send request structures that are in use. After a send is completed, a send request is supposed to be returned to this list and then get re-used. With 7 processes, it had reached a size of 16,324 send requests in use. With the 8 processes, it had reached a size of 16,708. Each send request is 720 bytes (in debug build it is 872) and if we do the math we have consumed about 12 Mbytes. Setting some type of bound will not fix this issue. There is something else going on here that is causing this problem. I know you described the problem earlier on, but maybe you can explain again? How many processes? What type of cluster?One other thought is perhaps trying Open MPI 1.7.2 to see if you still see the problem. Maybe someone else has suggestions too. Rolf PS: For those who missed a private email, I had Max add some instrumentation so we could see which list was growing. We now know it is the mca_pml_base_send_requests list. -Original Message- From: Max Staufer [mailto:max.stau...@gmx.net] Sent: Friday, September 13, 2013 7:06 AM To: Rolf vandeVaart;de...@open-mpi.org Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list Hi Rolf, I applied your patch, the full output is rather big, even gzip > 10Mb, which is not good for the mailinglist, but the head and tail are below for a 7 and 8 processor run. Seem that the send requests are growing fast 4000 times in just 10 min. Do you now of a method to bound the list such that it is not growing excessivly ? thanks Max 7 Processor run -- [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev- env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- 1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pc
Re: [OMPI devel] Nearly unlimited growth of pml free list
Yes, it appears the send_requests list is the one that is growing. This list holds the send request structures that are in use. After a send is completed, a send request is supposed to be returned to this list and then get re-used. With 7 processes, it had reached a size of 16,324 send requests in use. With the 8 processes, it had reached a size of 16,708. Each send request is 720 bytes (in debug build it is 872) and if we do the math we have consumed about 12 Mbytes. Setting some type of bound will not fix this issue. There is something else going on here that is causing this problem. I know you described the problem earlier on, but maybe you can explain again? How many processes? What type of cluster?One other thought is perhaps trying Open MPI 1.7.2 to see if you still see the problem. Maybe someone else has suggestions too. Rolf PS: For those who missed a private email, I had Max add some instrumentation so we could see which list was growing. We now know it is the mca_pml_base_send_requests list. >-Original Message- >From: Max Staufer [mailto:max.stau...@gmx.net] >Sent: Friday, September 13, 2013 7:06 AM >To: Rolf vandeVaart; de...@open-mpi.org >Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list > >Hi Rolf, > >I applied your patch, the full output is rather big, even gzip > 10Mb, > which is >not good for the mailinglist, but the head and tail are below for a 7 and 8 >processor run. >Seem that the send requests are growing fast 4000 times in just 10 min. > >Do you now of a method to bound the list such that it is not growing excessivly >? > >thanks > >Max > >7 Processor run >-- >[gpu207.dev-env.lan:11236] Iteration = 0 sleeping [gpu207.dev-env.lan:11236] >Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] >Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11236] >Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev- >env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, >maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- >1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, >maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, >recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- >env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping >[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- >1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, >maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- >1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, >maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, >recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- >env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping >[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- >1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, >maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- >1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, >maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, >recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- >env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping >[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- >1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, >maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- >1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, >maxAlloc=-1 [gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0, >recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev- >env.lan:11236] [gpu207.dev-env.lan:11236] Iteration = 0 sleeping >[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=- >1 [gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4, >maxAlloc=-1 >[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=- >1 [gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, >maxAlloc=-1 [gp
Re: [OMPI devel] Nearly unlimited growth of pml free list
ist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 ... [gpu207.dev-env.lan:11322] Iteration = 0 sleeping [gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-env.lan:11322] [gpu207.dev-env.lan:11322] Iteration = 0 sleeping [gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-env.lan:11322] [gpu207.dev-env.lan:11322] Iteration = 0 sleeping [gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-env.lan:11322] [gpu207.dev-env.lan:11322] Iteration = 0 sleeping [gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 [gpu207.dev-env.lan:11322] [gpu207.dev-env.lan:11322] Iteration = 0 sleeping [gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708, maxAlloc=-1 [gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1 [gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0, recv_pending=0, send_pending=0, comm_pending=0 Am 12.09.2013 17:04, schrieb Rolf vandeVaart: Can you apply this patch and try again? It will print out the sizes of the free lists after every 100 calls into the mca_pml_ob1_send. It would be interesting to see which one is growing. This might give us some clues. Rolf -Original Message- From: Max Staufer [mailto:max.stau...@gmx.net] Sent: Thursday, September 12, 2013 3:53 AM To: Rolf vandeVaart Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list Hi Rolf, the heap snapshots I do tell me where and when the memory has been allocated, and a simple source trace of the in tells me that the calling routine was mca_pml_ob1_send and that all of the ~10 single allocations during the run were called because of an MPI_ALLREDUCE command called in exactly one place of the code. The tool I use for doing that is MemorySCAPE but I thing Valgrind can tell you the same thing. However, I was not able to reproduce the problem in a simpler program yet, but I suspect it has something to do with the locking mechanism of the list elements. I dont know enough about OMPI to comment on that, but it looks like that the list is growing because all elements are locked. really any help is appreciated Max PS: IF I MIMICK ALLREDUCE with 2*Nproc SEND and RECV commands (aggregating on proc 0 and then sending out to all Proc) I get the same kind of behaviour. Am 11.09.2013 17:12, schrieb Rolf vandeVaart: Hi Max: You say that that the function keeps "allocating memory in the
Re: [OMPI devel] Nearly unlimited growth of pml free list
Hi Max: You say that that the function keeps "allocating memory in the pml free list." How do you know that is happening? Do you know which free list it is happening on? There are something like 8 free lists associated with the pml ob1 so it would be interesting to know which one you observe is growing. Rolf >-Original Message- >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Max Staufer >Sent: Wednesday, September 11, 2013 10:23 AM >To: de...@open-mpi.org >Subject: [OMPI devel] Nearly unlimited growth of pml free list > >Hi All, > > as I already asked in the users list, I was told thats not the right > place to ask, >I came across a "missbehaviour" of openmpi version 1.4.5 and 1.6.5 alike. > >the mca_pml_ob1_send function keeps allocating memory in the pml free list. >It does that indefinitly. In my case the list grew to about 100Gb. > >I can controll the maximum using the pml_ob1_free_list_max parameter, but >then the application just stops working when this number of entries in the list >is reached. > >The interesting part is that the growth only happens in a single place in the >code, which is RECURSIVE SUBROUTINE. > >And the called function is an MPI_ALLREDUCE(... MPI_SUM) > >Apparently its not easy to create a test program that shows the same >behaviour, just recursion is not enought. > >Is there a mca parameter that allows to limit the total list size without >making >the app. stop ? > >or is there a way to enforce the lock on the free list entries ? > >Thanks for all the help > >Max >___ >devel mailing list >de...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/devel --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ---
[OMPI devel] Nearly unlimited growth of pml free list
Hi All, as I already asked in the users list, I was told thats not the right place to ask, I came across a "missbehaviour" of openmpi version 1.4.5 and 1.6.5 alike. the mca_pml_ob1_send function keeps allocating memory in the pml free list. It does that indefinitly. In my case the list grew to about 100Gb. I can controll the maximum using the pml_ob1_free_list_max parameter, but then the application just stops working when this number of entries in the list is reached. The interesting part is that the growth only happens in a single place in the code, which is RECURSIVE SUBROUTINE. And the called function is an MPI_ALLREDUCE(... MPI_SUM) Apparently its not easy to create a test program that shows the same behaviour, just recursion is not enought. Is there a mca parameter that allows to limit the total list size without making the app. stop ? or is there a way to enforce the lock on the free list entries ? Thanks for all the help Max