Re: [OMPI devel] Help needed to run OMPI jobs under internal resource manager
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/03/11 12:35, Tony Lam wrote: > This is really helpful. I'm looking into the examples, > will ask again on the list if I have more questions later. Open-MPI also supports the Task Manager (TM) API provided by Torque (and OpenPBS I suspect, from which it is derived): [root@bruce-m openmpi-1.4.2]# find . -name tm ./orte/mca/plm/tm ./orte/mca/ras/tm cheers! Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk13IHgACgkQO2KABBYQAh/pVACfdeAOwl4SP0OOb93+tafw5eGG TTMAnijsxR+qktwUYpuLhm8paTk5YM+W =LOPm -END PGP SIGNATURE-
Re: [OMPI devel] BTL preferred_protocol , large message
Hi George, This certainly looks like our motivations are close. However, I don't see in the presentation how you implement it (maybe I misread it), especially how you manage to not modify the BTL interface. Do you have any code / SVN commit references for us to better understand what it's about ? Thanks, Sylvain On Tue, 8 Mar 2011, George Bosilca wrote: On Mar 8, 2011, at 12:12 , Damien Guinier wrote: Hi Jeff Sorry, your email went on the devel mailing list of Open MPI. I'm working on large message exchange optimization. My optimization consists in "choosing the best protocol for each large message". In fact, - for each device, the way to chose the best protocol is different. - the faster protocol for a given device depends on that device hardware and on the message specifications. So the device/BTL itself is the best place to dynamically select the fastest protocol. Presently, for large messages, the protocol selection is only based on device capabilities. My optimization consists in asking the device/BTL for a "preferred protocol" and then make a choice based on : - the device capabilities and the BTL's recommendation. As a BTL will not randomly change its preferred protocol, one can assume it will depend on the peer. Here is a similar approach to one you describe in your email, but without modification of the BTL interface. https://fs.hlrs.de/projects/eurompi2010/TALKS/WEDNESDAY_AFTERNOON/george_bosilca_locality_and_topology_aware.pdf george. Technical view: The optimization is located in mca_pml_ob1_send_request_start_btl(), after the device/btl selection. In the large message section, I call a new function : mca_pml_ob1_preferred_protocol() => mca_bml_base_preferred_protocol() This one will try to launch btl->btl_preferred_protocol() So, selecting a protocol before a large message in not in the critical path. It is the BTL's responsibility to define this function to select a preferred protocol. If this function is not defined, nothing changes in the code path To do this optimization , I had to add an interface to the btl module structure in "btl.h", this is the drawback. I have already used this feature to optimize the "shared memory" device/BTL. I use the "preferred_protocol" feature to enable/disable KNEM according to intra/inter socket communication. This optimization increases a "IMB pingping benchmark" bandwidth by ~36%. The next step is now to use the "preferred protocol" feature with openib ( with many IB cards) Attached 2 patches: 1) BTL_preferred.patch: introduces the new preferred protocol interface 2) SM_KNEM_intra_socket.patch: defines the preferred protocol for the sm btl Note: Since the "ess" framework can't give us the "socket locality information", I used hitopo that has been proposed in an RFC some times ago: http://www.open-mpi.org/community/lists/devel/2010/11/8677.php ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel "I disapprove of what you say, but I will defend to the death your right to say it" -- Evelyn Beatrice Hall ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Communication Failure with orted_comm.c
Your suggestion worked Ralph. I only add : OBJ_RELEASE(buffer); buffer = OBJ_NEW(opal_buffer_t); Thank you both for your help. Hugo 2011/3/8 George Bosilca > The stack trace indicate that your orted segfaulted in the > orte_odls_base_notify_iof_complete which means it received a message that > was interpreted as a ORTE_DAEMON_IOF_COMPLETE (21). Nothing more to get out > from your output unfortunately. > > george. > > On Mar 8, 2011, at 08:15 , Hugo Meyer wrote: > > > Hello @ll. > > > > I've got a problem in a communication between the > v_protocol_receiver_component.c and the orted_comm.c. > > > > In the mca_vprotocol_receiver_component_init i've added a request that > is received correctly by the orte_daemon_process_commands but when i try to > reply to the sender i get the next error: > > > > [clus1:15593] [ 0] /lib64/libpthread.so.0 [0x2bb03d40] > > [clus1:15593] [ 1] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 > [0x2ad760db] > > [clus1:15593] [ 2] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 > [0x2ad75aa4] > > [clus1:15593] [ 3] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/openmpi/mca_errmgr_orted.so > [0x2e2d2fdd] > > [clus1:15593] [ 4] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_odls_base_notify_iof_complete+0x1da) > [0x2ad42cb0] > > [clus1:15593] [ 5] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_process_commands+0x1068) > [0x2ad19ca6] > > [clus1:15593] [ 6] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x81b) > [0x2ad18a55] > > [clus1:15593] [ 7] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 > [0x2ad9710e] > > [clus1:15593] [ 8] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 > [0x2ad974bb] > > [clus1:15593] [ 9] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_loop+0x1a) > [0x2ad972ad] > > [clus1:15593] [10] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_dispatch+0xe) > [0x2ad97166] > > [clus1:15593] [11] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon+0x2322) > [0x2ad17556] > > [clus1:15593] [12] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted > [0x4008a3] > > [clus1:15593] [13] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x2bd2d8a4] > > [clus1:15593] [14] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted > [0x400799] > > [clus1:15593] *** End of error message *** > > > > The code that i've added at the v_protocol_receiver_component.c is (in > bold the recv command that fails): > > > > int mca_vprotocol_receiver_request_protector(void) { > > orte_daemon_cmd_flag_t command; > > opal_buffer_t *buffer = NULL; > > int n = 1; > > > > command = ORTE_DAEMON_REQUEST_PROTECTOR_CMD; > > > > buffer = OBJ_NEW(opal_buffer_t); > > opal_dss.pack(buffer, &command, 1, ORTE_DAEMON_CMD); > > > > orte_rml.send_buffer(ORTE_PROC_MY_DAEMON, buffer, > ORTE_RML_TAG_DAEMON, 0); > > > > orte_rml.recv_buffer(ORTE_PROC_MY_DAEMON, buffer, > ORTE_DAEMON_REQUEST_PROTECTOR_CMD, 0); > > opal_dss.unpack(buffer, &mca_vprotocol_receiver.protector.jobid, &n, > OPAL_UINT32); > > opal_dss.unpack(buffer, &mca_vprotocol_receiver.protector.vpid, &n, > OPAL_UINT32); > > > > orte_process_info.protector.jobid = > mca_vprotocol_receiver.protector.jobid; > > orte_process_info.protector.vpid = > mca_vprotocol_receiver.protector.vpid; > > > > OBJ_RELEASE(buffer); > > > > return OMPI_SUCCESS; > > > > The code that i've added at the orted_comm.c is (in bold the send command > that fails): > > > > case ORTE_DAEMON_REQUEST_PROTECTOR_CMD: > > if (orte_debug_daemons_flag) { > > opal_output(0, "%s orted_recv: received request protector > from local proc %s", > > ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), > ORTE_NAME_PRINT(sender)); > > } > > /* Define the protector */ > > protector = (uint32_t)ORTE_PROC_MY_NAME->vpid + 1; > > if (protector >= (uint32_t)orte_process_info.num_procs) { > > protector = 0; > > } > > > > /* Pack the protector data */ > > answer = OBJ_NEW(opal_buffer_t); > > > > if (ORTE_SUCCESS != (ret = opal_dss.pack(answer, > &ORTE_PROC_MY_NAME->jobid, 1, OPAL_UINT32))) { > > ORTE_ERROR_LOG(ret); > > OBJ_RELEASE(answer); > > goto CLEANUP; > > } > > if (ORTE_SUCCESS != (ret = opal_dss.pack(answer, &protector, 1, > OPAL_UINT32))) { > > ORTE_ERROR_LOG(ret); > > OBJ_RELEASE(answer); > > goto CLEANUP; > > } > > if (orte_debug_daemons_flag) { > > opal_output(0, "EL PROTECTOR ASIGNADO para %s ES: %d\n", > > ORTE_NAME_PRINT(sender), protector); > > } > > > > /* Send the prot
Re: [OMPI devel] BTL preferred_protocol , large message
On Mar 9, 2011, at 03:00 , Sylvain Jeaugey wrote: > Hi George, > > This certainly looks like our motivations are close. However, I don't see in > the presentation how you implement it (maybe I misread it), especially how > you manage to not modify the BTL interface. > > Do you have any code / SVN commit references for us to better understand what > it's about ? One gets multiple non-overlapping BTL (in terms of peers), each with its own set of parameters and eventually accepted protocols. Mainly there will be one BTL per memory hierarchy. I'll cleanup the code and send you a patch. george. > > Thanks, > Sylvain > > On Tue, 8 Mar 2011, George Bosilca wrote: > >> >> On Mar 8, 2011, at 12:12 , Damien Guinier wrote: >> >>> Hi Jeff >> >> Sorry, your email went on the devel mailing list of Open MPI. >> >>> I'm working on large message exchange optimization. My optimization >>> consists in "choosing >>> the best protocol for each large message". >>> In fact, >>> - for each device, the way to chose the best protocol is different. >>> - the faster protocol for a given device depends on that device hardware >>> and on the message >>> specifications. >>> >>> So the device/BTL itself is the best place to dynamically select the >>> fastest protocol. >>> >>> Presently, for large messages, the protocol selection is only based on >>> device capabilities. >>> My optimization consists in asking the device/BTL for a "preferred >>> protocol" and >>> then make a choice based on : >>> - the device capabilities and the BTL's recommendation. >> >> As a BTL will not randomly change its preferred protocol, one can assume it >> will depend on the peer. Here is a similar approach to one you describe in >> your email, but without modification of the BTL interface. >> >> https://fs.hlrs.de/projects/eurompi2010/TALKS/WEDNESDAY_AFTERNOON/george_bosilca_locality_and_topology_aware.pdf >> >> george. >> >> >> >>> >>> Technical view: >>> The optimization is located in mca_pml_ob1_send_request_start_btl(), after >>> the device/btl selection. >>> In the large message section, I call a new function : >>> mca_pml_ob1_preferred_protocol() => mca_bml_base_preferred_protocol() >>> This one will try to launch >>> btl->btl_preferred_protocol() >>> So, selecting a protocol before a large message in not in the critical path. >>> It is the BTL's responsibility to define this function to select a >>> preferred protocol. >>> >>> If this function is not defined, nothing changes in the code path >>> To do this optimization , I had to add an interface to the btl module >>> structure in "btl.h", this is the drawback. >>> >>> >>> >>> I have already used this feature to optimize the "shared memory" >>> device/BTL. I use the "preferred_protocol" feature to enable/disable >>> KNEM according to intra/inter socket communication. This optimization >>> increases a "IMB pingping benchmark" bandwidth by ~36%. >>> >>> >>> >>> The next step is now to use the "preferred protocol" feature with openib ( >>> with many IB cards) >>> >>> >>> >>> Attached 2 patches: >>> 1) BTL_preferred.patch: >>> introduces the new preferred protocol interface >>> 2) SM_KNEM_intra_socket.patch: >>> defines the preferred protocol for the sm btl >>> Note: Since the "ess" framework can't give us the "socket locality >>>information", I used hitopo that has been proposed in an RFC >>>some times ago: >>>http://www.open-mpi.org/community/lists/devel/2010/11/8677.php >>> >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> "I disapprove of what you say, but I will defend to the death your right to >> say it" >> -- Evelyn Beatrice Hall >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel "To preserve the freedom of the human mind then and freedom of the press, every spirit should be ready to devote itself to martyrdom; for as long as we may think as we will, and speak as we think, the condition of man will proceed in improvement." -- Thomas Jefferson, 1799
[OMPI devel] affinity MPI extension not included in OMPI 1.5.2
Crud. It's specifically listed in the NEWS, but somehow it didn't get included in the tarball. I'll investigate. Should we do a 1.5.3 in the immediate future with the affinity extension? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] affinity MPI extension not included in OMPI 1.5.2
Please, do. On Wed, 2011-03-09 at 15:58 -0500, Jeff Squyres wrote: > Crud. It's specifically listed in the NEWS, but somehow it didn't get > included in the tarball. I'll investigate. > > Should we do a 1.5.3 in the immediate future with the affinity extension? > -- Kenneth A. Lloyd Director of Systems Science Watt Systems Technologies Inc. Albuquerque, NM USA kenneth.ll...@wattsys.com