[OMPI devel] Problems obtaining jdata->map in the HNP.

2012-03-18 Thread Hugo Daniel Meyer
Hello.

I've included a new list in the orte_node_t (because i need to have a copy
of my data structure per daemon), it is an array of my own data structure,
which i will fill with data about the processes in the job, and other data
that interest me.

For test purposes, i'm trying to command the table fill from the process
with rank 0. This process send a message (send_buffer) to his HNP (using
"process_command" of orted_comm.c). The HNP receives this command and try
to obtain jdata of the daemons with the jobid of the  sent by the rank 0.

* if (NULL == (jdata_orte = orte_get_job_data_object(jobid_orted)){*
* // problem*
*}*

I obtain the jdata_orte without problems, but, the jdata_orte->map is null,
and of course, i cannot do something like:

*node_from_map =
(orte_node_t*)opal_pointer_array_get_item(jdata->map->nodes, i);*
*
*

I need to obtain every node, and access my table to fill it.

My question is, the daemons do not fill this information, that's why i get
the jdata->map equal to NULL? If so, how can i obtain all the orte_node_t
objects to fill them with the information that i need? As i understand,
each daemon has a copy of the orte_node_t structures, is this so?

Thanks for the help.

Hugo


Re: [OMPI devel] Problems obtaining jdata->map in the HNP.

2012-03-18 Thread Ralph Castain
Is this in the trunk? Or in some release branch?

The daemon job has a map and nodes defined in it on the trunk, but not in 
earlier releases. If you want the HNP to find that info in an earlier release, 
you could instead cycle across the entries in orte_node_pool, looking for those 
that have a daemon assigned to them.

As for the daemons - no, they don't have a copy of the orte_node_t structures. 
Once the new state machine gets committed, then they will - but in the 
meantime, the only node-type information they have is in the orte_nidmap list. 
See the definition of orte_nid_t in orte/runtime/orte_globals.h.


On Mar 18, 2012, at 9:38 AM, Hugo Daniel Meyer wrote:

> Hello.
> 
> I've included a new list in the orte_node_t (because i need to have a copy of 
> my data structure per daemon), it is an array of my own data structure, which 
> i will fill with data about the processes in the job, and other data that 
> interest me. 
> 
> For test purposes, i'm trying to command the table fill from the process with 
> rank 0. This process send a message (send_buffer) to his HNP (using 
> "process_command" of orted_comm.c). The HNP receives this command and try to 
> obtain jdata of the daemons with the jobid of the  sent by the rank 0. 
>  if (NULL == (jdata_orte = orte_get_job_data_object(jobid_orted)){
>  // problem
> }
> I obtain the jdata_orte without problems, but, the jdata_orte->map is null, 
> and of course, i cannot do something like:
> 
> node_from_map = (orte_node_t*)opal_pointer_array_get_item(jdata->map->nodes, 
> i);
> 
> I need to obtain every node, and access my table to fill it. 
> 
> My question is, the daemons do not fill this information, that's why i get 
> the jdata->map equal to NULL? If so, how can i obtain all the orte_node_t 
> objects to fill them with the information that i need? As i understand, each 
> daemon has a copy of the orte_node_t structures, is this so?
> 
> Thanks for the help.
> 
> Hugo
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] New odls component fails

2012-03-18 Thread Ralph Castain

On Mar 17, 2012, at 4:18 PM, Alex Margolin wrote:

> On 03/17/2012 08:16 PM, Ralph Castain wrote:
>> I don't think you need to .ompi_ignore all those components. First, you need 
>> to use the --without-hwloc option (you misspelled it below as 
>> --disable-hwloc).
> I missed it, thank you.
>> Assuming you removed the relevant code from your clone of the default odls 
>> module, I suspect the calls are being made in ompi/runtime/ompi_mpi_init.c. 
>> If the process detects it isn't bound, it looks to see if it should bind 
>> itself. I thought that code was also turned "off" if we configured 
>> without-hwloc, so you might have to check it.
> I didn't remove any code from the default module. Should I have? (All I added 
> was inserting "mosrun -w" before the app name in the argv)

No, using --without-hwloc will turn off all the memory and cpu binding calls.

> Could you please explain what do you mean by "bound" and how can I bind 
> processes?

Binding means to tell the OS to restrict execution of this process to the 
specified cpus. You can also ask that it restrict all malloc'd memory to a 
region local to those cpus - this is where you get some of your prior error 
messages.

> Also, I'm now getting a similar error, but a quick check shows 
> ess_base_nidmap.c doesn't exist in the trunk:
> 
> ...
> [singularity:01899] OPAL dss:unpack: got type 22 when expecting type 16
> [singularity:01899] [[46635,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 57
> [singularity:01899] [[46635,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
> [singularity:01899] [[46635,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../orte/runtime/orte_init.c at line 132

This is typically caused by stale libraries in your install area. Did you rm 
-rf your prior installation before rebuilding? Did you recompile your 
application after your built?

These files no longer exist in the trunk, as you noted - so if something is 
looking for it, that means you either didn't clean out the old installation or 
you forgot to recompile the application after rebuilding OMPI.

> --
> ...
>> Shared memory is a separate issue. If you want/need to avoid it, then run 
>> with -mca btl ^sm and this will turn off all shared memory calls.
> After my last post I tried to rebuild and then even the simplest app wouldn't 
> start. Turns out I disabled all the shmem (mmap, posix, sysv) and orte 
> wouldn't start without any (so I had to turn it back on). Could you tell me 
> if there is a way to run the application without making any mmap() calls with 
> MAP_SHARED? Currently, mosrun is run with -w asking it to fail (return -1) on 
> any such system-call.

ORTE doesn't use shared memory, but I suspect that the opal shmem framework may 
object to not finding any usable component. We shouldn't error out for that 
reason, but the problem is present in the code. Edit the file 
opal/mca/shmem/base/shmem_base_select.c and change line 174 to return 
OPAL_SUCCESS. You may encounter other problems down the line as the system may 
not react well to not having anything there, but give it a try.

Worst case, you may have to add a "null" component to the opal/mca/shmem 
framework that does nothing, just so the framework has a defined module instead 
of a bunch of NULL function pointers.

> 
> Thanks for your help,
> Alex
> 
> 
>> 
>> 
>> On Mar 17, 2012, at 11:51 AM, Alex Margolin wrote:
>> 
>>> [singularity:15041] [[35712,0],0] orted_recv_cmd: received message from 
>>> [[35712,1],0]
>>> [singularity:15041] defining message event: orted/orted_comm.c 172
>>> [singularity:15041] [[35712,0],0] orted_recv_cmd: reissued recv
>>> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by 
>>> [[35712,1],0] for tag 1
>>> [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing 
>>> Command: ORTE_DAEMON_SYNC_WANT_NIDMAP
>>> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor: processing 
>>> commands completed
>>> [singularity:15042] OPAL dss:unpack: got type 33 when expecting type 12
>>> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in 
>>> file ../../../orte/util/nidmap.c at line 429
>>> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in 
>>> file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
>>> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in 
>>> file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
>>> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in 
>>> file ../../../orte/runtime/orte_init.c at line 132
>>> --
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that

Re: [OMPI devel] Problems obtaining jdata->map in the HNP.

2012-03-18 Thread Hugo Daniel Meyer
Hello Ralph.

I'm not using the trunk, i'm using a version from a year ago.

So, if i understand correctly, the best way to make what i want is to
include my data structure into the orte_nidmap, then cycle across all the
processes in my application to fill my structure.

Thanks for your reply Ralph.

Hugo

El 18 de marzo de 2012 19:46, Ralph Castain  escribió:

> Is this in the trunk? Or in some release branch?
>
> The daemon job has a map and nodes defined in it on the trunk, but not in
> earlier releases. If you want the HNP to find that info in an earlier
> release, you could instead cycle across the entries in orte_node_pool,
> looking for those that have a daemon assigned to them.
>
> As for the daemons - no, they don't have a copy of the orte_node_t
> structures. Once the new state machine gets committed, then they will - but
> in the meantime, the only node-type information they have is in the
> orte_nidmap list. See the definition of orte_nid_t in
> orte/runtime/orte_globals.h.
>
>
> On Mar 18, 2012, at 9:38 AM, Hugo Daniel Meyer wrote:
>
> Hello.
>
> I've included a new list in the orte_node_t (because i need to have a copy
> of my data structure per daemon), it is an array of my own data structure,
> which i will fill with data about the processes in the job, and other data
> that interest me.
>
> For test purposes, i'm trying to command the table fill from the process
> with rank 0. This process send a message (send_buffer) to his HNP (using
> "process_command" of orted_comm.c). The HNP receives this command and try
> to obtain jdata of the daemons with the jobid of the  sent by the rank 0.
>
> * if (NULL == (jdata_orte = orte_get_job_data_object(jobid_orted)){*
> * // problem*
> *}*
>
> I obtain the jdata_orte without problems, but, the jdata_orte->map is
> null, and of course, i cannot do something like:
>
> *node_from_map =
> (orte_node_t*)opal_pointer_array_get_item(jdata->map->nodes, i);*
> *
> *
>
> I need to obtain every node, and access my table to fill it.
>
> My question is, the daemons do not fill this information, that's why i get
> the jdata->map equal to NULL? If so, how can i obtain all the orte_node_t
> objects to fill them with the information that i need? As i understand,
> each daemon has a copy of the orte_node_t structures, is this so?
>
> Thanks for the help.
>
> Hugo
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] RFC: ob1: fallback on put/send on rget failure

2012-03-18 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 16/03/12 08:14, Shamis, Pavel wrote:

> I did not get any patch.

It arrived OK here, you can get it from the archive:

http://www.open-mpi.org/community/lists/devel/2012/03/10717.php

- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9mkwoACgkQO2KABBYQAh/4FwCghl/yE6A7IMMON6u2/RpplhzE
HxQAn2suJEOYOoG+povWbuqKpkhWphyU
=6/CG
-END PGP SIGNATURE-


[OMPI devel] RFC: ORTE state machine

2012-03-18 Thread Ralph Castain
WHY: Enable async progress

WHAT:  Restructure ORTE to operate as a completely event-driven state machine

WHEN:  ~April 1 (seems appropriate)

SIGNIFICANT CHANGES:
* grpcomm API has changed
* routed API has changed
* state framework has been added to ORTE
* OPAL SOS has been removed (per IU)
* --enable-resilient-orte and all epoch code has been removed (per UTK)

KNOWN BREAKAGE:
* checkpoint/restart is almost certainly broken

This has been discussed several times over the last 6-8 months. Going forward, 
we need to enable async progress at both the OMPI and ORTE level. This change 
deals solely with the latter area. All interactions with the ORTE level have 
been made non-blocking to allow the MPI layer to continue making separate 
progress. This is reflected in changes made to ompi_mpi_init, 
ompi_mpi_finalize, and dpm_orte.

The largest change is the introduction of the ORTE "state" framework that moves 
the launch of a job thru a series of events, each processing one step of the 
launch procedure. So allocation becomes an event, as does mapping. The state 
machine is implemented as a linked list, so variations of the procedures can be 
easily implemented by those wanting to try something different from the base 
implementation.

The daemon collectives have also been reworked to remove their "tree" 
dependency. Non-tree collectives can now be performed, and a few are in the 
works and should be committed shortly after the state machine is in the trunk.

The ability to run an ORTE progress thread has been included in the configure 
code (--enable-orte-progress-thread), but is off by default. As Brian noted, 
the MPI layer is not ready for this feature at this time. However, the ORTE 
code is fully prepared, so those interested in working on completing the async 
progress work in the MPI layer can do so.

The state machine branch is at https://bitbucket.org/rhc/ompi-term. I'm still 
doing some cleanup there, so don't be surprised if debug messages appear and/or 
things aren't completely right just yet.