Re: [OMPI devel] OMPI-MIGRATE error
Hi Josh. Thanks for your reply. I'll tell you what i'm getting now from the executions in the next lines. When i run without doing a checkpoint i get this output, and the process don' finish: *[hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun -np 2 -am ft-enable-cr-recovery ./whoami 10 10* *Antes de MPI_Init* *Antes de MPI_Init* *[clus9:04985] [[41167,0],0] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287* *[clus9:04985] [[41167,0],0] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287* *Soy el número 1 (1)* *Terminando, una instrucción antes del finalize* *Soy el número 0 (1)* *Terminando, una instrucción antes del finalize* *--* *Error: The process below has failed. There is no checkpoint available for* * this job, so we are terminating the application since automatic* * recovery cannot occur.* *Internal Name: [[41167,1],0]* *MCW Rank: 0* * * *--* *[clus9:04985] 1 more process has sent help message help-orte-errmgr-hnp.txt / autor_failed_to_recover_proc* *[clus9:04985] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages* * * If i make a checkpoint in another terminal of the mpirun process, during the execution, i get this output: *[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287* *[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287* *[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at line 350* *[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at line 323* *[clus9:06107] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26 * *[clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at line 350* *[clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at line 323* *[clus9:06106] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26 * *--* *Notice: The job has been successfully recovered from the * *last checkpoint.* *--* *Soy el número 1 (1)* *Terminando, una instrucción antes del finalize* *Soy el número 0 (1)* *Terminando, una instrucción antes del finalize* *[clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt / autor_recovering_job* *[clus9:06105] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages* *[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287* *[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287* *[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at line 350* *[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at line 323* *[clus9:06107] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26 * *[clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at line 350* *[clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at line 323* *[clus9:06106] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26 * *[clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt / autor_recovery_complete* *Soy el número 0 (1)* *Terminando, una instrucción antes del finalize* *Soy el número 1 (1)* *Terminando, una instrucción antes del finalize* *[clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt / autor_recovering_job* *[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287* *[clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287* *[clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at line 350* *[clus9:06107]
Re: [OMPI devel] OMPI-MIGRATE error
I believe that this is now fixed on the trunk. All the details are in the commit message: https://svn.open-mpi.org/trac/ompi/changeset/24317 In my testing yesterday, I did not test the scenario where the node with mpirun also contains processes (the test cluster I was using does not by default run this way). So I was able to reproduce by running on a single node. There were a couple bugs that emerged that are fixed in the commit. The two bugs that were hurting you were the TCP socket cleanup (which caused the looping of the automatic recovery), and the incorrect accounting of local process termination (which caused the modex errors). Let me know if that fixes the problems that you were seeing. Thanks for the bug report and your patience while I pursued a fix. -- Josh On Jan 27, 2011, at 11:28 AM, Hugo Meyer wrote: > Hi Josh. > > Thanks for your reply. I'll tell you what i'm getting now from the executions > in the next lines. > When i run without doing a checkpoint i get this output, and the process don' > finish: > > [hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun > -np 2 -am ft-enable-cr-recovery ./whoami 10 10 > Antes de MPI_Init > Antes de MPI_Init > [clus9:04985] [[41167,0],0] ORTE_ERROR_LOG: Error in file > ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287 > [clus9:04985] [[41167,0],0] ORTE_ERROR_LOG: Error in file > ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287 > Soy el número 1 (1) > Terminando, una instrucción antes del finalize > Soy el número 0 (1) > Terminando, una instrucción antes del finalize > -- > Error: The process below has failed. There is no checkpoint available for >this job, so we are terminating the application since automatic >recovery cannot occur. > Internal Name: [[41167,1],0] > MCW Rank: 0 > > -- > [clus9:04985] 1 more process has sent help message help-orte-errmgr-hnp.txt / > autor_failed_to_recover_proc > [clus9:04985] Set MCA parameter "orte_base_help_aggregate" to 0 to see all > help / error messages > > If i make a checkpoint in another terminal of the mpirun process, during the > execution, i get this output: > > [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file > ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287 > [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file > ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287 > [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at > line 350 > [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at > line 323 > [clus9:06107] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26 > [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at > line 350 > [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at > line 323 > [clus9:06106] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26 > -- > Notice: The job has been successfully recovered from the > last checkpoint. > -- > Soy el número 1 (1) > Terminando, una instrucción antes del finalize > Soy el número 0 (1) > Terminando, una instrucción antes del finalize > [clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt / > autor_recovering_job > [clus9:06105] Set MCA parameter "orte_base_help_aggregate" to 0 to see all > help / error messages > [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file > ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287 > [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file > ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287 > [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at > line 350 > [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at > line 323 > [clus9:06107] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26 > [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at > line 350 > [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file ../..
[OMPI devel] OFED question
All - On one of our clusters, we're seeing the following on one of our applications, I believe using Open MPI 1.4.3: [xxx:27545] *** An error occurred in MPI_Scatterv [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4 [xxx:27545] *** MPI_ERR_OTHER: known error not in list [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error creating qp errno says Resource temporarily unavailable -- mpirun has exited due to process rank 0 with PID 27545 on node rs1891 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- The problem goes away if we modify the eager protocol msg sizes so that there are only two QPs necessary instead of the default 4. Is there a way to bump up the number of QPs that can be created on a node, assuming the issue is just running out of available QPs? If not, any other thoughts on working around the problem? Thanks, Brian -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories
Re: [OMPI devel] OFED question
Brian, The ability to control the number of available QPs will vary by vendor. Unless things have changed in recent years, Mellanox's firmware tools allow one the modify the limit but at the inconvenience of reburning the firmware. I know of no other way and know nothing about other vendors. -Paul On 1/27/2011 2:56 PM, Barrett, Brian W wrote: All - On one of our clusters, we're seeing the following on one of our applications, I believe using Open MPI 1.4.3: [xxx:27545] *** An error occurred in MPI_Scatterv [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4 [xxx:27545] *** MPI_ERR_OTHER: known error not in list [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error creating qp errno says Resource temporarily unavailable -- mpirun has exited due to process rank 0 with PID 27545 on node rs1891 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- The problem goes away if we modify the eager protocol msg sizes so that there are only two QPs necessary instead of the default 4. Is there a way to bump up the number of QPs that can be created on a node, assuming the issue is just running out of available QPs? If not, any other thoughts on working around the problem? Thanks, Brian -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] OFED question
Unfortunately verbose error reports are not so friendly...anyway , I may think about 2 issues: 1. You trying to open open too much QPs. By default ib devices support fairly large amount of QPs and it is quite hard to push it to this corner. But If your job is really huge it may be the case. Or for example, if you share the compute nodes with some other processes that create a lot of qps. The maximum amount of supported qps you may see in ibv_devinfo. 2. The memory limit for registered memory is too low, as result driver fails allocate and register memory for QP. This scenario is most common. Just happened to me recently, system folks pushed some crap into limits.conf. Regards, Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote: > All - > > On one of our clusters, we're seeing the following on one of our > applications, I believe using Open MPI 1.4.3: > > [xxx:27545] *** An error occurred in MPI_Scatterv > [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4 > [xxx:27545] *** MPI_ERR_OTHER: known error not in list > [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error > creating qp errno says Resource temporarily unavailable > -- > mpirun has exited due to process rank 0 with PID 27545 on > node rs1891 exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -- > > > The problem goes away if we modify the eager protocol msg sizes so that there > are only two QPs necessary instead of the default 4. Is there a way to bump > up the number of QPs that can be created on a node, assuming the issue is > just running out of available QPs? If not, any other thoughts on working > around the problem? > > Thanks, > > Brian > > -- > Brian W. Barrett > Dept. 1423: Scalable System Software > Sandia National Laboratories > > > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] OFED question
Pasha - Is there a way to tell which of the two happened or to check the number of QPs available per node? The app likely does talk to a large number of peers from each process, and the nodes are fairly "fat" - it's quad socket, quad core and they are running 16 MPI ranks for each node. Brian On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote: > Unfortunately verbose error reports are not so friendly...anyway , I may > think about 2 issues: > > 1. You trying to open open too much QPs. By default ib devices support fairly > large amount of QPs and it is quite hard to push it to this corner. But If > your job is really huge it may be the case. Or for example, if you share the > compute nodes with some other processes that create a lot of qps. The maximum > amount of supported qps you may see in ibv_devinfo. > > 2. The memory limit for registered memory is too low, as result driver fails > allocate and register memory for QP. This scenario is most common. Just > happened to me recently, system folks pushed some crap into limits.conf. > > Regards, > > Pavel (Pasha) Shamis > --- > Application Performance Tools Group > Computer Science and Math Division > Oak Ridge National Laboratory > > > > > > > On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote: > >> All - >> >> On one of our clusters, we're seeing the following on one of our >> applications, I believe using Open MPI 1.4.3: >> >> [xxx:27545] *** An error occurred in MPI_Scatterv >> [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4 >> [xxx:27545] *** MPI_ERR_OTHER: known error not in list >> [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) >> [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error >> creating qp errno says Resource temporarily unavailable >> -- >> mpirun has exited due to process rank 0 with PID 27545 on >> node rs1891 exiting without calling "finalize". This may >> have caused other processes in the application to be >> terminated by signals sent by mpirun (as reported here). >> -- >> >> >> The problem goes away if we modify the eager protocol msg sizes so that >> there are only two QPs necessary instead of the default 4. Is there a way >> to bump up the number of QPs that can be created on a node, assuming the >> issue is just running out of available QPs? If not, any other thoughts on >> working around the problem? >> >> Thanks, >> >> Brian >> >> -- >> Brian W. Barrett >> Dept. 1423: Scalable System Software >> Sandia National Laboratories >> >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories
Re: [OMPI devel] OFED question
Brian, As Pasha said: The maximum amount of supported qps you may see in ibv_devinfo. However you'll probably need "-v": {hargrove@cvrsvc05 ~}$ ibv_devinfo | grep max_qp: {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep max_qp: max_qp: 261056 If you really are running out of QPs due to the "fattness" of the node, then you should definitely look at enabling XRC if your HCA and libibverbs version supports it. ibv_devinfo can query the HCA capability: {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep port_cap_flags: port_cap_flags: 0x02510868 and look for bit 0x0010 ( == 1<<20). -Paul On 1/27/2011 5:09 PM, Barrett, Brian W wrote: Pasha - Is there a way to tell which of the two happened or to check the number of QPs available per node? The app likely does talk to a large number of peers from each process, and the nodes are fairly "fat" - it's quad socket, quad core and they are running 16 MPI ranks for each node. Brian On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote: Unfortunately verbose error reports are not so friendly...anyway , I may think about 2 issues: 1. You trying to open open too much QPs. By default ib devices support fairly large amount of QPs and it is quite hard to push it to this corner. But If your job is really huge it may be the case. Or for example, if you share the compute nodes with some other processes that create a lot of qps. The maximum amount of supported qps you may see in ibv_devinfo. 2. The memory limit for registered memory is too low, as result driver fails allocate and register memory for QP. This scenario is most common. Just happened to me recently, system folks pushed some crap into limits.conf. Regards, Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote: All - On one of our clusters, we're seeing the following on one of our applications, I believe using Open MPI 1.4.3: [xxx:27545] *** An error occurred in MPI_Scatterv [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4 [xxx:27545] *** MPI_ERR_OTHER: known error not in list [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error creating qp errno says Resource temporarily unavailable -- mpirun has exited due to process rank 0 with PID 27545 on node rs1891 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- The problem goes away if we modify the eager protocol msg sizes so that there are only two QPs necessary instead of the default 4. Is there a way to bump up the number of QPs that can be created on a node, assuming the issue is just running out of available QPs? If not, any other thoughts on working around the problem? Thanks, Brian -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] OFED question
Brain, I would calculate maximum number of qps for all-to-all connection: 4*num_nodes*num_cores^2 And then compare it to the number reported by : ibv_devinfo -v | grep max_qp If your theoretical maximum is close to ib_devinfo number, then I would suspect the qp limitation. Driver manage some internal qps, so you can not get the max. For memory limit, I do not have any good idea. If it happens in early stages of app, then probably the limit is really small and I would verify it with IT. Regards, Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 27, 2011, at 8:09 PM, Barrett, Brian W wrote: > Pasha - > > Is there a way to tell which of the two happened or to check the number of > QPs available per node? The app likely does talk to a large number of peers > from each process, and the nodes are fairly "fat" - it's quad socket, quad > core and they are running 16 MPI ranks for each node. > > Brian > > On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote: > >> Unfortunately verbose error reports are not so friendly...anyway , I may >> think about 2 issues: >> >> 1. You trying to open open too much QPs. By default ib devices support >> fairly large amount of QPs and it is quite hard to push it to this corner. >> But If your job is really huge it may be the case. Or for example, if you >> share the compute nodes with some other processes that create a lot of qps. >> The maximum amount of supported qps you may see in ibv_devinfo. >> >> 2. The memory limit for registered memory is too low, as result driver fails >> allocate and register memory for QP. This scenario is most common. Just >> happened to me recently, system folks pushed some crap into limits.conf. >> >> Regards, >> >> Pavel (Pasha) Shamis >> --- >> Application Performance Tools Group >> Computer Science and Math Division >> Oak Ridge National Laboratory >> >> >> >> >> >> >> On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote: >> >>> All - >>> >>> On one of our clusters, we're seeing the following on one of our >>> applications, I believe using Open MPI 1.4.3: >>> >>> [xxx:27545] *** An error occurred in MPI_Scatterv >>> [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4 >>> [xxx:27545] *** MPI_ERR_OTHER: known error not in list >>> [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) >>> [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] >>> error creating qp errno says Resource temporarily unavailable >>> -- >>> mpirun has exited due to process rank 0 with PID 27545 on >>> node rs1891 exiting without calling "finalize". This may >>> have caused other processes in the application to be >>> terminated by signals sent by mpirun (as reported here). >>> -- >>> >>> >>> The problem goes away if we modify the eager protocol msg sizes so that >>> there are only two QPs necessary instead of the default 4. Is there a way >>> to bump up the number of QPs that can be created on a node, assuming the >>> issue is just running out of available QPs? If not, any other thoughts on >>> working around the problem? >>> >>> Thanks, >>> >>> Brian >>> >>> -- >>> Brian W. Barrett >>> Dept. 1423: Scalable System Software >>> Sandia National Laboratories >>> >>> >>> >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > -- > Brian W. Barrett > Dept. 1423: Scalable System Software > Sandia National Laboratories > > > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] OFED question
Good point Paul. I love XRC :-) You may try to switch default configuration to XRC. --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 If XRC is not supported on your platform, ompi should report some nice message. BTW, on multi core system XRC should show better performance. Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 27, 2011, at 8:19 PM, Paul H. Hargrove wrote: > Brian, > > As Pasha said: >> The maximum amount of supported qps you may see in ibv_devinfo. > > However you'll probably need "-v": > > {hargrove@cvrsvc05 ~}$ ibv_devinfo | grep max_qp: > {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep max_qp: > max_qp: 261056 > > If you really are running out of QPs due to the "fattness" of the node, > then you should definitely look at enabling XRC if your HCA and > libibverbs version supports it. ibv_devinfo can query the HCA capability: > > {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep port_cap_flags: > port_cap_flags: 0x02510868 > > and look for bit 0x0010 ( == 1<<20). > > -Paul > > > > On 1/27/2011 5:09 PM, Barrett, Brian W wrote: >> Pasha - >> >> Is there a way to tell which of the two happened or to check the number of >> QPs available per node? The app likely does talk to a large number of peers >> from each process, and the nodes are fairly "fat" - it's quad socket, quad >> core and they are running 16 MPI ranks for each node. >> >> Brian >> >> On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote: >> >>> Unfortunately verbose error reports are not so friendly...anyway , I may >>> think about 2 issues: >>> >>> 1. You trying to open open too much QPs. By default ib devices support >>> fairly large amount of QPs and it is quite hard to push it to this corner. >>> But If your job is really huge it may be the case. Or for example, if you >>> share the compute nodes with some other processes that create a lot of qps. >>> The maximum amount of supported qps you may see in ibv_devinfo. >>> >>> 2. The memory limit for registered memory is too low, as result driver >>> fails allocate and register memory for QP. This scenario is most common. >>> Just happened to me recently, system folks pushed some crap into >>> limits.conf. >>> >>> Regards, >>> >>> Pavel (Pasha) Shamis >>> --- >>> Application Performance Tools Group >>> Computer Science and Math Division >>> Oak Ridge National Laboratory >>> >>> >>> >>> >>> >>> >>> On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote: >>> All - On one of our clusters, we're seeing the following on one of our applications, I believe using Open MPI 1.4.3: [xxx:27545] *** An error occurred in MPI_Scatterv [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4 [xxx:27545] *** MPI_ERR_OTHER: known error not in list [xxx:27545] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [xxx][[31806,1],0][connect/btl_openib_connect_oob.c:857:qp_create_one] error creating qp errno says Resource temporarily unavailable -- mpirun has exited due to process rank 0 with PID 27545 on node rs1891 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- The problem goes away if we modify the eager protocol msg sizes so that there are only two QPs necessary instead of the default 4. Is there a way to bump up the number of QPs that can be created on a node, assuming the issue is just running out of available QPs? If not, any other thoughts on working around the problem? Thanks, Brian -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> -- >> Brian W. Barrett >> Dept. 1423: Scalable System Software >> Sandia National Laboratories >> >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > HPC Research Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > _
Re: [OMPI devel] OFED question
RFE: Could OMPI implement a short-hand for Pasha's following magical incantation? On 1/27/2011 5:34 PM, Shamis, Pavel wrote: --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900