Re: [OMPI devel] OMPI-MIGRATE error

2011-01-27 Thread Hugo Meyer
Hi Josh.

Thanks for your reply. I'll tell you what i'm getting now from the
executions in the next lines.
When i run without doing a checkpoint i get this output, and the process
don' finish:

*[hmeyer@clus9 whoami]$
/home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun -np 2 -am
ft-enable-cr-recovery ./whoami 10 10*
*Antes de MPI_Init*
*Antes de MPI_Init*
*[clus9:04985] [[41167,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287*
*[clus9:04985] [[41167,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287*
*Soy el número 1 (1)*
*Terminando, una instrucción antes del finalize*
*Soy el número 0 (1)*
*Terminando, una instrucción antes del finalize*
*--*
*Error: The process below has failed. There is no checkpoint available for*
*   this job, so we are terminating the application since automatic*
*   recovery cannot occur.*
*Internal Name: [[41167,1],0]*
*MCW Rank: 0*
*
*
*--*
*[clus9:04985] 1 more process has sent help message help-orte-errmgr-hnp.txt
/ autor_failed_to_recover_proc*
*[clus9:04985] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages*
*
*

If i make a checkpoint in another terminal of the mpirun process, during the
execution, i get this output:

*[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287*
*[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287*
*[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end
of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at
line 350*
*[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end
of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c
at line 323*
*[clus9:06107] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
*
*[clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end
of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at
line 350*
*[clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end
of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c
at line 323*
*[clus9:06106] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
*
*--*
*Notice: The job has been successfully recovered from the *
*last checkpoint.*
*--*
*Soy el número 1 (1)*
*Terminando, una instrucción antes del finalize*
*Soy el número 0 (1)*
*Terminando, una instrucción antes del finalize*
*[clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt
/ autor_recovering_job*
*[clus9:06105] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages*
*[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287*
*[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287*
*[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end
of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at
line 350*
*[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end
of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c
at line 323*
*[clus9:06107] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
*
*[clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end
of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at
line 350*
*[clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end
of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c
at line 323*
*[clus9:06106] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
*
*[clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt
/ autor_recovery_complete*
*Soy el número 0 (1)*
*Terminando, una instrucción antes del finalize*
*Soy el número 1 (1)*
*Terminando, una instrucción antes del finalize*
*[clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt
/ autor_recovering_job*
*[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287*
*[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287*
*[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end
of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at
line 350*
*[clus9:06107] 

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-27 Thread Joshua Hursey
I believe that this is now fixed on the trunk. All the details are in the 
commit message:
  https://svn.open-mpi.org/trac/ompi/changeset/24317

In my testing yesterday, I did not test the scenario where the node with mpirun 
also contains processes (the test cluster I was using does not by default run 
this way). So I was able to reproduce by running on a single node. There were a 
couple bugs that emerged that are fixed in the commit. The two bugs that were 
hurting you were the TCP socket cleanup (which caused the looping of the 
automatic recovery), and the incorrect accounting of local process termination 
(which caused the modex errors).

Let me know if that fixes the problems that you were seeing.

Thanks for the bug report and your patience while I pursued a fix.

-- Josh

On Jan 27, 2011, at 11:28 AM, Hugo Meyer wrote:

> Hi Josh.
> 
> Thanks for your reply. I'll tell you what i'm getting now from the executions 
> in the next lines.
> When i run without doing a checkpoint i get this output, and the process don' 
> finish:
> 
> [hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun 
> -np 2 -am ft-enable-cr-recovery ./whoami 10 10
> Antes de MPI_Init
> Antes de MPI_Init
> [clus9:04985] [[41167,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> [clus9:04985] [[41167,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> Soy el número 1 (1)
> Terminando, una instrucción antes del finalize
> Soy el número 0 (1)
> Terminando, una instrucción antes del finalize
> --
> Error: The process below has failed. There is no checkpoint available for
>this job, so we are terminating the application since automatic
>recovery cannot occur.
> Internal Name: [[41167,1],0]
> MCW Rank: 0
> 
> --
> [clus9:04985] 1 more process has sent help message help-orte-errmgr-hnp.txt / 
> autor_failed_to_recover_proc
> [clus9:04985] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
> 
> If i make a checkpoint in another terminal of the mpirun process, during the 
> execution, i get this output:
> 
> [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at 
> line 350
> [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at 
> line 323
> [clus9:06107] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
> [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at 
> line 350
> [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at 
> line 323
> [clus9:06106] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
> --
> Notice: The job has been successfully recovered from the 
> last checkpoint.
> --
> Soy el número 1 (1)
> Terminando, una instrucción antes del finalize
> Soy el número 0 (1)
> Terminando, una instrucción antes del finalize
> [clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt / 
> autor_recovering_job
> [clus9:06105] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
> [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at 
> line 350
> [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at 
> line 323
> [clus9:06107] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
> [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at 
> line 350
> [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../..

[OMPI devel] OFED question

2011-01-27 Thread Barrett, Brian W
All -

On one of our clusters, we're seeing the following on one of our applications, 
I believe using Open MPI 1.4.3:

[xxx:27545] *** An error occurred in MPI_Scatterv
[xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
[xxx:27545] *** MPI_ERR_OTHER: known error not in list
[xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error 
creating qp errno says Resource temporarily unavailable
--
mpirun has exited due to process rank 0 with PID 27545 on
node rs1891 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--


The problem goes away if we modify the eager protocol msg sizes so that there 
are only two QPs necessary instead of the default 4.  Is there a way to bump up 
the number of QPs that can be created on a node, assuming the issue is just 
running out of available QPs?  If not, any other thoughts on working around the 
problem?

Thanks,

Brian

--
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories







Re: [OMPI devel] OFED question

2011-01-27 Thread Paul H. Hargrove

Brian,

  The ability to control the number of available QPs will vary by 
vendor.  Unless things have changed in recent years, Mellanox's firmware 
tools allow one the modify the limit but at the inconvenience of 
reburning the firmware.  I know of no other way and know nothing about 
other vendors.


-Paul

On 1/27/2011 2:56 PM, Barrett, Brian W wrote:

All -

On one of our clusters, we're seeing the following on one of our applications, 
I believe using Open MPI 1.4.3:

[xxx:27545] *** An error occurred in MPI_Scatterv
[xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
[xxx:27545] *** MPI_ERR_OTHER: known error not in list
[xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error 
creating qp errno says Resource temporarily unavailable
--
mpirun has exited due to process rank 0 with PID 27545 on
node rs1891 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--


The problem goes away if we modify the eager protocol msg sizes so that there 
are only two QPs necessary instead of the default 4.  Is there a way to bump up 
the number of QPs that can be created on a node, assuming the issue is just 
running out of available QPs?  If not, any other thoughts on working around the 
problem?

Thanks,

Brian

--
   Brian W. Barrett
   Dept. 1423: Scalable System Software
   Sandia National Laboratories





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] OFED question

2011-01-27 Thread Shamis, Pavel
Unfortunately verbose error reports are not so friendly...anyway , I may think 
about 2 issues:

1. You trying to open open too much QPs. By default ib devices support fairly 
large amount of QPs and it is quite hard to push it to this corner. But If your 
job is really huge it may be the case. Or for example, if you share the compute 
nodes with some other processes that create a lot of qps. The maximum amount of 
supported qps you may see in ibv_devinfo.

2. The memory limit for registered memory is too low, as result driver fails 
allocate and register memory for QP. This scenario is most common. Just 
happened to me recently, system folks pushed some crap into limits.conf.

Regards,

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote:

> All -
> 
> On one of our clusters, we're seeing the following on one of our 
> applications, I believe using Open MPI 1.4.3:
> 
> [xxx:27545] *** An error occurred in MPI_Scatterv
> [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
> [xxx:27545] *** MPI_ERR_OTHER: known error not in list
> [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error 
> creating qp errno says Resource temporarily unavailable
> --
> mpirun has exited due to process rank 0 with PID 27545 on
> node rs1891 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --
> 
> 
> The problem goes away if we modify the eager protocol msg sizes so that there 
> are only two QPs necessary instead of the default 4.  Is there a way to bump 
> up the number of QPs that can be created on a node, assuming the issue is 
> just running out of available QPs?  If not, any other thoughts on working 
> around the problem?
> 
> Thanks,
> 
> Brian
> 
> --
>  Brian W. Barrett
>  Dept. 1423: Scalable System Software
>  Sandia National Laboratories
> 
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] OFED question

2011-01-27 Thread Barrett, Brian W
Pasha -

Is there a way to tell which of the two happened or to check the number of QPs 
available per node?  The app likely does talk to a large number of peers from 
each process, and the nodes are fairly "fat" - it's quad socket, quad core and 
they are running 16 MPI ranks for each node.  

Brian

On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote:

> Unfortunately verbose error reports are not so friendly...anyway , I may 
> think about 2 issues:
> 
> 1. You trying to open open too much QPs. By default ib devices support fairly 
> large amount of QPs and it is quite hard to push it to this corner. But If 
> your job is really huge it may be the case. Or for example, if you share the 
> compute nodes with some other processes that create a lot of qps. The maximum 
> amount of supported qps you may see in ibv_devinfo.
> 
> 2. The memory limit for registered memory is too low, as result driver fails 
> allocate and register memory for QP. This scenario is most common. Just 
> happened to me recently, system folks pushed some crap into limits.conf.
> 
> Regards,
> 
> Pavel (Pasha) Shamis
> ---
> Application Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
> 
> 
> 
> 
> 
> 
> On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote:
> 
>> All -
>> 
>> On one of our clusters, we're seeing the following on one of our 
>> applications, I believe using Open MPI 1.4.3:
>> 
>> [xxx:27545] *** An error occurred in MPI_Scatterv
>> [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
>> [xxx:27545] *** MPI_ERR_OTHER: known error not in list
>> [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error 
>> creating qp errno says Resource temporarily unavailable
>> --
>> mpirun has exited due to process rank 0 with PID 27545 on
>> node rs1891 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --
>> 
>> 
>> The problem goes away if we modify the eager protocol msg sizes so that 
>> there are only two QPs necessary instead of the default 4.  Is there a way 
>> to bump up the number of QPs that can be created on a node, assuming the 
>> issue is just running out of available QPs?  If not, any other thoughts on 
>> working around the problem?
>> 
>> Thanks,
>> 
>> Brian
>> 
>> --
>> Brian W. Barrett
>> Dept. 1423: Scalable System Software
>> Sandia National Laboratories
>> 
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

--
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories







Re: [OMPI devel] OFED question

2011-01-27 Thread Paul H. Hargrove

Brian,

As Pasha said:

The maximum amount of supported qps you may see in ibv_devinfo.


However you'll probably need "-v":

{hargrove@cvrsvc05 ~}$ ibv_devinfo | grep max_qp:
{hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep max_qp:
max_qp: 261056

If you really are running out of QPs due to the "fattness" of the node, 
then you should definitely look at enabling XRC if your HCA and 
libibverbs version supports it.  ibv_devinfo can query the HCA capability:


{hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep port_cap_flags:
port_cap_flags: 0x02510868

and look for bit 0x0010  ( == 1<<20).

-Paul



On 1/27/2011 5:09 PM, Barrett, Brian W wrote:

Pasha -

Is there a way to tell which of the two happened or to check the number of QPs available 
per node?  The app likely does talk to a large number of peers from each process, and the 
nodes are fairly "fat" - it's quad socket, quad core and they are running 16 
MPI ranks for each node.

Brian

On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote:


Unfortunately verbose error reports are not so friendly...anyway , I may think 
about 2 issues:

1. You trying to open open too much QPs. By default ib devices support fairly 
large amount of QPs and it is quite hard to push it to this corner. But If your 
job is really huge it may be the case. Or for example, if you share the compute 
nodes with some other processes that create a lot of qps. The maximum amount of 
supported qps you may see in ibv_devinfo.

2. The memory limit for registered memory is too low, as result driver fails 
allocate and register memory for QP. This scenario is most common. Just 
happened to me recently, system folks pushed some crap into limits.conf.

Regards,

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote:


All -

On one of our clusters, we're seeing the following on one of our applications, 
I believe using Open MPI 1.4.3:

[xxx:27545] *** An error occurred in MPI_Scatterv
[xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
[xxx:27545] *** MPI_ERR_OTHER: known error not in list
[xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error 
creating qp errno says Resource temporarily unavailable
--
mpirun has exited due to process rank 0 with PID 27545 on
node rs1891 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--


The problem goes away if we modify the eager protocol msg sizes so that there 
are only two QPs necessary instead of the default 4.  Is there a way to bump up 
the number of QPs that can be created on a node, assuming the issue is just 
running out of available QPs?  If not, any other thoughts on working around the 
problem?

Thanks,

Brian

--
Brian W. Barrett
Dept. 1423: Scalable System Software
Sandia National Laboratories





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
   Brian W. Barrett
   Dept. 1423: Scalable System Software
   Sandia National Laboratories





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] OFED question

2011-01-27 Thread Shamis, Pavel
Brain,
I would calculate maximum number of qps for all-to-all connection:
4*num_nodes*num_cores^2
And then compare it to the number reported by : ibv_devinfo -v | grep max_qp
If your theoretical maximum is close to ib_devinfo number, then I would suspect 
the qp limitation. Driver manage some internal qps, so you can not get the max.

For memory limit, I do not have any good idea. If it happens in early stages of 
app, then probably the limit is really small and I would verify it with IT.

Regards,
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jan 27, 2011, at 8:09 PM, Barrett, Brian W wrote:

> Pasha -
> 
> Is there a way to tell which of the two happened or to check the number of 
> QPs available per node?  The app likely does talk to a large number of peers 
> from each process, and the nodes are fairly "fat" - it's quad socket, quad 
> core and they are running 16 MPI ranks for each node.  
> 
> Brian
> 
> On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote:
> 
>> Unfortunately verbose error reports are not so friendly...anyway , I may 
>> think about 2 issues:
>> 
>> 1. You trying to open open too much QPs. By default ib devices support 
>> fairly large amount of QPs and it is quite hard to push it to this corner. 
>> But If your job is really huge it may be the case. Or for example, if you 
>> share the compute nodes with some other processes that create a lot of qps. 
>> The maximum amount of supported qps you may see in ibv_devinfo.
>> 
>> 2. The memory limit for registered memory is too low, as result driver fails 
>> allocate and register memory for QP. This scenario is most common. Just 
>> happened to me recently, system folks pushed some crap into limits.conf.
>> 
>> Regards,
>> 
>> Pavel (Pasha) Shamis
>> ---
>> Application Performance Tools Group
>> Computer Science and Math Division
>> Oak Ridge National Laboratory
>> 
>> 
>> 
>> 
>> 
>> 
>> On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote:
>> 
>>> All -
>>> 
>>> On one of our clusters, we're seeing the following on one of our 
>>> applications, I believe using Open MPI 1.4.3:
>>> 
>>> [xxx:27545] *** An error occurred in MPI_Scatterv
>>> [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
>>> [xxx:27545] *** MPI_ERR_OTHER: known error not in list
>>> [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>> [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] 
>>> error creating qp errno says Resource temporarily unavailable
>>> --
>>> mpirun has exited due to process rank 0 with PID 27545 on
>>> node rs1891 exiting without calling "finalize". This may
>>> have caused other processes in the application to be
>>> terminated by signals sent by mpirun (as reported here).
>>> --
>>> 
>>> 
>>> The problem goes away if we modify the eager protocol msg sizes so that 
>>> there are only two QPs necessary instead of the default 4.  Is there a way 
>>> to bump up the number of QPs that can be created on a node, assuming the 
>>> issue is just running out of available QPs?  If not, any other thoughts on 
>>> working around the problem?
>>> 
>>> Thanks,
>>> 
>>> Brian
>>> 
>>> --
>>> Brian W. Barrett
>>> Dept. 1423: Scalable System Software
>>> Sandia National Laboratories
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> --
>  Brian W. Barrett
>  Dept. 1423: Scalable System Software
>  Sandia National Laboratories
> 
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] OFED question

2011-01-27 Thread Shamis, Pavel
Good point Paul.

I love XRC :-)

You may try to switch default configuration to XRC.
--mca btl_openib_receive_queues 
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32

If XRC is not supported on your platform, ompi should report some nice message.

BTW, on multi core system XRC should show better performance.

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jan 27, 2011, at 8:19 PM, Paul H. Hargrove wrote:

> Brian,
> 
> As Pasha said:
>> The maximum amount of supported qps you may see in ibv_devinfo.
> 
> However you'll probably need "-v":
> 
> {hargrove@cvrsvc05 ~}$ ibv_devinfo | grep max_qp:
> {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep max_qp:
> max_qp: 261056
> 
> If you really are running out of QPs due to the "fattness" of the node, 
> then you should definitely look at enabling XRC if your HCA and 
> libibverbs version supports it.  ibv_devinfo can query the HCA capability:
> 
> {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep port_cap_flags:
> port_cap_flags: 0x02510868
> 
> and look for bit 0x0010  ( == 1<<20).
> 
> -Paul
> 
> 
> 
> On 1/27/2011 5:09 PM, Barrett, Brian W wrote:
>> Pasha -
>> 
>> Is there a way to tell which of the two happened or to check the number of 
>> QPs available per node?  The app likely does talk to a large number of peers 
>> from each process, and the nodes are fairly "fat" - it's quad socket, quad 
>> core and they are running 16 MPI ranks for each node.
>> 
>> Brian
>> 
>> On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote:
>> 
>>> Unfortunately verbose error reports are not so friendly...anyway , I may 
>>> think about 2 issues:
>>> 
>>> 1. You trying to open open too much QPs. By default ib devices support 
>>> fairly large amount of QPs and it is quite hard to push it to this corner. 
>>> But If your job is really huge it may be the case. Or for example, if you 
>>> share the compute nodes with some other processes that create a lot of qps. 
>>> The maximum amount of supported qps you may see in ibv_devinfo.
>>> 
>>> 2. The memory limit for registered memory is too low, as result driver 
>>> fails allocate and register memory for QP. This scenario is most common. 
>>> Just happened to me recently, system folks pushed some crap into 
>>> limits.conf.
>>> 
>>> Regards,
>>> 
>>> Pavel (Pasha) Shamis
>>> ---
>>> Application Performance Tools Group
>>> Computer Science and Math Division
>>> Oak Ridge National Laboratory
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote:
>>> 
 All -
 
 On one of our clusters, we're seeing the following on one of our 
 applications, I believe using Open MPI 1.4.3:
 
 [xxx:27545] *** An error occurred in MPI_Scatterv
 [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
 [xxx:27545] *** MPI_ERR_OTHER: known error not in list
 [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
 [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] 
 error creating qp errno says Resource temporarily unavailable
 --
 mpirun has exited due to process rank 0 with PID 27545 on
 node rs1891 exiting without calling "finalize". This may
 have caused other processes in the application to be
 terminated by signals sent by mpirun (as reported here).
 --
 
 
 The problem goes away if we modify the eager protocol msg sizes so that 
 there are only two QPs necessary instead of the default 4.  Is there a way 
 to bump up the number of QPs that can be created on a node, assuming the 
 issue is just running out of available QPs?  If not, any other thoughts on 
 working around the problem?
 
 Thanks,
 
 Brian
 
 --
 Brian W. Barrett
 Dept. 1423: Scalable System Software
 Sandia National Laboratories
 
 
 
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> --
>>   Brian W. Barrett
>>   Dept. 1423: Scalable System Software
>>   Sandia National Laboratories
>> 
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> _

Re: [OMPI devel] OFED question

2011-01-27 Thread Paul H. Hargrove


RFE:  Could OMPI implement a short-hand for Pasha's following magical 
incantation?


On 1/27/2011 5:34 PM, Shamis, Pavel wrote:

--mca btl_openib_receive_queues 
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900