Good point Paul.

I love XRC :-)

You may try to switch default configuration to XRC.
--mca btl_openib_receive_queues 
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32

If XRC is not supported on your platform, ompi should report some nice message.

BTW, on multi core system XRC should show better performance.

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jan 27, 2011, at 8:19 PM, Paul H. Hargrove wrote:

> Brian,
> 
> As Pasha said:
>> The maximum amount of supported qps you may see in ibv_devinfo.
> 
> However you'll probably need "-v":
> 
> {hargrove@cvrsvc05 ~}$ ibv_devinfo | grep max_qp:
> {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep max_qp:
>         max_qp:                         261056
> 
> If you really are running out of QPs due to the "fattness" of the node, 
> then you should definitely look at enabling XRC if your HCA and 
> libibverbs version supports it.  ibv_devinfo can query the HCA capability:
> 
> {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep port_cap_flags:
>                         port_cap_flags:         0x02510868
> 
> and look for bit 0x00100000  ( == 1<<20).
> 
> -Paul
> 
> 
> 
> On 1/27/2011 5:09 PM, Barrett, Brian W wrote:
>> Pasha -
>> 
>> Is there a way to tell which of the two happened or to check the number of 
>> QPs available per node?  The app likely does talk to a large number of peers 
>> from each process, and the nodes are fairly "fat" - it's quad socket, quad 
>> core and they are running 16 MPI ranks for each node.
>> 
>> Brian
>> 
>> On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote:
>> 
>>> Unfortunately verbose error reports are not so friendly...anyway , I may 
>>> think about 2 issues:
>>> 
>>> 1. You trying to open open too much QPs. By default ib devices support 
>>> fairly large amount of QPs and it is quite hard to push it to this corner. 
>>> But If your job is really huge it may be the case. Or for example, if you 
>>> share the compute nodes with some other processes that create a lot of qps. 
>>> The maximum amount of supported qps you may see in ibv_devinfo.
>>> 
>>> 2. The memory limit for registered memory is too low, as result driver 
>>> fails allocate and register memory for QP. This scenario is most common. 
>>> Just happened to me recently, system folks pushed some crap into 
>>> limits.conf.
>>> 
>>> Regards,
>>> 
>>> Pavel (Pasha) Shamis
>>> ---
>>> Application Performance Tools Group
>>> Computer Science and Math Division
>>> Oak Ridge National Laboratory
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote:
>>> 
>>>> All -
>>>> 
>>>> On one of our clusters, we're seeing the following on one of our 
>>>> applications, I believe using Open MPI 1.4.3:
>>>> 
>>>> [xxx:27545] *** An error occurred in MPI_Scatterv
>>>> [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
>>>> [xxx:27545] *** MPI_ERR_OTHER: known error not in list
>>>> [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>>> [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] 
>>>> error creating qp errno says Resource temporarily unavailable
>>>> --------------------------------------------------------------------------
>>>> mpirun has exited due to process rank 0 with PID 27545 on
>>>> node rs1891 exiting without calling "finalize". This may
>>>> have caused other processes in the application to be
>>>> terminated by signals sent by mpirun (as reported here).
>>>> --------------------------------------------------------------------------
>>>> 
>>>> 
>>>> The problem goes away if we modify the eager protocol msg sizes so that 
>>>> there are only two QPs necessary instead of the default 4.  Is there a way 
>>>> to bump up the number of QPs that can be created on a node, assuming the 
>>>> issue is just running out of available QPs?  If not, any other thoughts on 
>>>> working around the problem?
>>>> 
>>>> Thanks,
>>>> 
>>>> Brian
>>>> 
>>>> --
>>>> Brian W. Barrett
>>>> Dept. 1423: Scalable System Software
>>>> Sandia National Laboratories
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> --
>>   Brian W. Barrett
>>   Dept. 1423: Scalable System Software
>>   Sandia National Laboratories
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department                   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to