Re: [OMPI devel] Help needed to run OMPI jobs under internal resource manager

2011-03-09 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/03/11 12:35, Tony Lam wrote:

> This is really helpful. I'm looking into the examples,
> will ask again on the list if I have more questions later.

Open-MPI also supports the Task Manager (TM) API provided
by Torque (and OpenPBS I suspect, from which it is derived):

[root@bruce-m openmpi-1.4.2]# find . -name tm
./orte/mca/plm/tm
./orte/mca/ras/tm

cheers!
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk13IHgACgkQO2KABBYQAh/pVACfdeAOwl4SP0OOb93+tafw5eGG
TTMAnijsxR+qktwUYpuLhm8paTk5YM+W
=LOPm
-END PGP SIGNATURE-


Re: [OMPI devel] BTL preferred_protocol , large message

2011-03-09 Thread Sylvain Jeaugey

Hi George,

This certainly looks like our motivations are close. However, I don't see 
in the presentation how you implement it (maybe I misread it), especially 
how you manage to not modify the BTL interface.


Do you have any code / SVN commit references for us to better understand 
what it's about ?


Thanks,
Sylvain

On Tue, 8 Mar 2011, George Bosilca wrote:



On Mar 8, 2011, at 12:12 , Damien Guinier wrote:


Hi Jeff


Sorry, your email went on the devel mailing list of Open MPI.


I'm working on large message exchange optimization. My optimization consists in 
"choosing
the best protocol for each large message".
In fact,
- for each device, the way to chose the best protocol is different.
- the faster protocol for a given device depends on that device hardware and on 
the message
specifications.

So the device/BTL itself is the best place to dynamically select the fastest 
protocol.

Presently, for large messages, the protocol selection is only based on device 
capabilities.
My optimization consists in asking the device/BTL for a "preferred protocol" and
then make a choice based on :
   - the device capabilities and the BTL's recommendation.


As a BTL will not randomly change its preferred protocol, one can assume 
it will depend on the peer. Here is a similar approach to one you 
describe in your email, but without modification of the BTL interface.


https://fs.hlrs.de/projects/eurompi2010/TALKS/WEDNESDAY_AFTERNOON/george_bosilca_locality_and_topology_aware.pdf

 george.





Technical view:
The optimization is located in mca_pml_ob1_send_request_start_btl(), after the 
device/btl selection.
In the large message section, I call a new function :
  mca_pml_ob1_preferred_protocol() => mca_bml_base_preferred_protocol()
This one will try to launch
  btl->btl_preferred_protocol()
So, selecting a protocol before a large message in not in the critical path.
It is the BTL's responsibility to define this function to select a preferred 
protocol.

If this function is not defined, nothing changes in the code path
To do this optimization , I had to add an interface to the btl module structure in 
"btl.h", this is the drawback.



I have already used this feature to optimize the "shared memory" device/BTL. I use the 
"preferred_protocol" feature to enable/disable
KNEM according to intra/inter socket communication. This optimization increases a 
"IMB pingping benchmark" bandwidth by ~36%.



The next step is now to use the "preferred protocol" feature with openib ( with 
many IB cards)



Attached 2 patches:
1) BTL_preferred.patch:
  introduces the new preferred protocol interface
2) SM_KNEM_intra_socket.patch:
  defines the preferred protocol for the sm btl
  Note: Since the "ess" framework can't give us the "socket locality
information", I used hitopo that has been proposed in an RFC
some times ago:
http://www.open-mpi.org/community/lists/devel/2010/11/8677.php



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


"I disapprove of what you say, but I will defend to the death your right to say 
it"
 -- Evelyn Beatrice Hall


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Communication Failure with orted_comm.c

2011-03-09 Thread Hugo Meyer
Your suggestion worked Ralph.

I only add :

OBJ_RELEASE(buffer);
buffer = OBJ_NEW(opal_buffer_t);

Thank you both for your help.

Hugo

2011/3/8 George Bosilca 

> The stack trace indicate that your orted segfaulted in the
> orte_odls_base_notify_iof_complete which means it received a message that
> was interpreted as a ORTE_DAEMON_IOF_COMPLETE (21). Nothing more to get out
> from your output unfortunately.
>
>  george.
>
> On Mar 8, 2011, at 08:15 , Hugo Meyer wrote:
>
> > Hello @ll.
> >
> > I've got a problem in a communication between the
> v_protocol_receiver_component.c and the orted_comm.c.
> >
> > In the mca_vprotocol_receiver_component_init  i've added a request that
> is received correctly by the orte_daemon_process_commands but when i try to
> reply to the sender i get the next error:
> >
> > [clus1:15593] [ 0] /lib64/libpthread.so.0 [0x2bb03d40]
> > [clus1:15593] [ 1]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0
> [0x2ad760db]
> > [clus1:15593] [ 2]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0
> [0x2ad75aa4]
> > [clus1:15593] [ 3]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/openmpi/mca_errmgr_orted.so
> [0x2e2d2fdd]
> > [clus1:15593] [ 4]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_odls_base_notify_iof_complete+0x1da)
> [0x2ad42cb0]
> > [clus1:15593] [ 5]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_process_commands+0x1068)
> [0x2ad19ca6]
> > [clus1:15593] [ 6]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x81b)
> [0x2ad18a55]
> > [clus1:15593] [ 7]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0
> [0x2ad9710e]
> > [clus1:15593] [ 8]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0
> [0x2ad974bb]
> > [clus1:15593] [ 9]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_loop+0x1a)
> [0x2ad972ad]
> > [clus1:15593] [10]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_dispatch+0xe)
> [0x2ad97166]
> > [clus1:15593] [11]
> /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon+0x2322)
> [0x2ad17556]
> > [clus1:15593] [12] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted
> [0x4008a3]
> > [clus1:15593] [13] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x2bd2d8a4]
> > [clus1:15593] [14] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted
> [0x400799]
> > [clus1:15593] *** End of error message ***
> >
> > The code that i've added at the v_protocol_receiver_component.c is (in
> bold the recv command that fails):
> >
> > int mca_vprotocol_receiver_request_protector(void) {
> > orte_daemon_cmd_flag_t command;
> > opal_buffer_t *buffer = NULL;
> > int n = 1;
> >
> > command = ORTE_DAEMON_REQUEST_PROTECTOR_CMD;
> >
> > buffer = OBJ_NEW(opal_buffer_t);
> > opal_dss.pack(buffer, &command, 1, ORTE_DAEMON_CMD);
> >
> > orte_rml.send_buffer(ORTE_PROC_MY_DAEMON, buffer,
> ORTE_RML_TAG_DAEMON, 0);
> >
> > orte_rml.recv_buffer(ORTE_PROC_MY_DAEMON, buffer,
> ORTE_DAEMON_REQUEST_PROTECTOR_CMD, 0);
> > opal_dss.unpack(buffer, &mca_vprotocol_receiver.protector.jobid, &n,
> OPAL_UINT32);
> > opal_dss.unpack(buffer, &mca_vprotocol_receiver.protector.vpid, &n,
> OPAL_UINT32);
> >
> > orte_process_info.protector.jobid =
> mca_vprotocol_receiver.protector.jobid;
> > orte_process_info.protector.vpid  =
> mca_vprotocol_receiver.protector.vpid;
> >
> > OBJ_RELEASE(buffer);
> >
> > return OMPI_SUCCESS;
> >
> > The code that i've added at the orted_comm.c is (in bold the send command
> that fails):
> >
> > case ORTE_DAEMON_REQUEST_PROTECTOR_CMD:
> > if (orte_debug_daemons_flag) {
> > opal_output(0, "%s orted_recv: received request protector
> from local proc %s",
> > ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
> ORTE_NAME_PRINT(sender));
> > }
> > /* Define the protector */
> > protector = (uint32_t)ORTE_PROC_MY_NAME->vpid + 1;
> > if (protector >= (uint32_t)orte_process_info.num_procs) {
> > protector = 0;
> > }
> >
> > /* Pack the protector data */
> > answer = OBJ_NEW(opal_buffer_t);
> >
> > if (ORTE_SUCCESS != (ret = opal_dss.pack(answer,
> &ORTE_PROC_MY_NAME->jobid, 1, OPAL_UINT32))) {
> > ORTE_ERROR_LOG(ret);
> > OBJ_RELEASE(answer);
> > goto CLEANUP;
> > }
> > if (ORTE_SUCCESS != (ret = opal_dss.pack(answer, &protector, 1,
> OPAL_UINT32))) {
> > ORTE_ERROR_LOG(ret);
> > OBJ_RELEASE(answer);
> > goto CLEANUP;
> > }
> > if (orte_debug_daemons_flag) {
> > opal_output(0, "EL PROTECTOR ASIGNADO para %s ES: %d\n",
> > ORTE_NAME_PRINT(sender), protector);
> > }
> >
> > /* Send the prot

Re: [OMPI devel] BTL preferred_protocol , large message

2011-03-09 Thread George Bosilca

On Mar 9, 2011, at 03:00 , Sylvain Jeaugey wrote:

> Hi George,
> 
> This certainly looks like our motivations are close. However, I don't see in 
> the presentation how you implement it (maybe I misread it), especially how 
> you manage to not modify the BTL interface.
> 
> Do you have any code / SVN commit references for us to better understand what 
> it's about ?

One gets multiple non-overlapping BTL (in terms of peers), each with its own 
set of parameters and eventually accepted protocols. Mainly there will be one 
BTL per memory hierarchy.

I'll cleanup the code and send you a patch.

  george.


> 
> Thanks,
> Sylvain
> 
> On Tue, 8 Mar 2011, George Bosilca wrote:
> 
>> 
>> On Mar 8, 2011, at 12:12 , Damien Guinier wrote:
>> 
>>> Hi Jeff
>> 
>> Sorry, your email went on the devel mailing list of Open MPI.
>> 
>>> I'm working on large message exchange optimization. My optimization 
>>> consists in "choosing
>>> the best protocol for each large message".
>>> In fact,
>>> - for each device, the way to chose the best protocol is different.
>>> - the faster protocol for a given device depends on that device hardware 
>>> and on the message
>>> specifications.
>>> 
>>> So the device/BTL itself is the best place to dynamically select the 
>>> fastest protocol.
>>> 
>>> Presently, for large messages, the protocol selection is only based on 
>>> device capabilities.
>>> My optimization consists in asking the device/BTL for a "preferred 
>>> protocol" and
>>> then make a choice based on :
>>>   - the device capabilities and the BTL's recommendation.
>> 
>> As a BTL will not randomly change its preferred protocol, one can assume it 
>> will depend on the peer. Here is a similar approach to one you describe in 
>> your email, but without modification of the BTL interface.
>> 
>> https://fs.hlrs.de/projects/eurompi2010/TALKS/WEDNESDAY_AFTERNOON/george_bosilca_locality_and_topology_aware.pdf
>> 
>> george.
>> 
>> 
>> 
>>> 
>>> Technical view:
>>> The optimization is located in mca_pml_ob1_send_request_start_btl(), after 
>>> the device/btl selection.
>>> In the large message section, I call a new function :
>>>  mca_pml_ob1_preferred_protocol() => mca_bml_base_preferred_protocol()
>>> This one will try to launch
>>>  btl->btl_preferred_protocol()
>>> So, selecting a protocol before a large message in not in the critical path.
>>> It is the BTL's responsibility to define this function to select a 
>>> preferred protocol.
>>> 
>>> If this function is not defined, nothing changes in the code path
>>> To do this optimization , I had to add an interface to the btl module 
>>> structure in "btl.h", this is the drawback.
>>> 
>>> 
>>> 
>>> I have already used this feature to optimize the "shared memory" 
>>> device/BTL. I use the "preferred_protocol" feature to enable/disable
>>> KNEM according to intra/inter socket communication. This optimization 
>>> increases a "IMB pingping benchmark" bandwidth by ~36%.
>>> 
>>> 
>>> 
>>> The next step is now to use the "preferred protocol" feature with openib ( 
>>> with many IB cards)
>>> 
>>> 
>>> 
>>> Attached 2 patches:
>>> 1) BTL_preferred.patch:
>>>  introduces the new preferred protocol interface
>>> 2) SM_KNEM_intra_socket.patch:
>>>  defines the preferred protocol for the sm btl
>>>  Note: Since the "ess" framework can't give us the "socket locality
>>>information", I used hitopo that has been proposed in an RFC
>>>some times ago:
>>>http://www.open-mpi.org/community/lists/devel/2010/11/8677.php
>>> 
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> "I disapprove of what you say, but I will defend to the death your right to 
>> say it"
>> -- Evelyn Beatrice Hall
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

"To preserve the freedom of the human mind then and freedom of the press, every 
spirit should be ready to devote itself to martyrdom; for as long as we may 
think as we will, and speak as we think, the condition of man will proceed in 
improvement."
  -- Thomas Jefferson, 1799




[OMPI devel] affinity MPI extension not included in OMPI 1.5.2

2011-03-09 Thread Jeff Squyres
Crud.  It's specifically listed in the NEWS, but somehow it didn't get included 
in the tarball.  I'll investigate.

Should we do a 1.5.3 in the immediate future with the affinity extension?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] affinity MPI extension not included in OMPI 1.5.2

2011-03-09 Thread Ken Lloyd
Please, do.

On Wed, 2011-03-09 at 15:58 -0500, Jeff Squyres wrote:
> Crud.  It's specifically listed in the NEWS, but somehow it didn't get 
> included in the tarball.  I'll investigate.
> 
> Should we do a 1.5.3 in the immediate future with the affinity extension?
> 
-- 

Kenneth A. Lloyd
Director of Systems Science
Watt Systems Technologies Inc.
Albuquerque, NM USA
kenneth.ll...@wattsys.com