On 03/17/2012 08:16 PM, Ralph Castain wrote:
I don't think you need to .ompi_ignore all those components. First, you need to 
use the --without-hwloc option (you misspelled it below as --disable-hwloc).
I missed it, thank you.
Assuming you removed the relevant code from your clone of the default odls module, I 
suspect the calls are being made in ompi/runtime/ompi_mpi_init.c. If the process detects 
it isn't bound, it looks to see if it should bind itself. I thought that code was also 
turned "off" if we configured without-hwloc, so you might have to check it.
I didn't remove any code from the default module. Should I have? (All I added was inserting "mosrun -w" before the app name in the argv) Could you please explain what do you mean by "bound" and how can I bind processes? Also, I'm now getting a similar error, but a quick check shows ess_base_nidmap.c doesn't exist in the trunk:

...
[singularity:01899] OPAL dss:unpack: got type 22 when expecting type 16
[singularity:01899] [[46635,1],0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 57 [singularity:01899] [[46635,1],0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173 [singularity:01899] [[46635,1],0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../orte/runtime/orte_init.c at line 132
--------------------------------------------------------------------------
...
Shared memory is a separate issue. If you want/need to avoid it, then run with 
-mca btl ^sm and this will turn off all shared memory calls.
After my last post I tried to rebuild and then even the simplest app wouldn't start. Turns out I disabled all the shmem (mmap, posix, sysv) and orte wouldn't start without any (so I had to turn it back on). Could you tell me if there is a way to run the application without making any mmap() calls with MAP_SHARED? Currently, mosrun is run with -w asking it to fail (return -1) on any such system-call.

Thanks for your help,
Alex




On Mar 17, 2012, at 11:51 AM, Alex Margolin wrote:

[singularity:15041] [[35712,0],0] orted_recv_cmd: received message from 
[[35712,1],0]
[singularity:15041] defining message event: orted/orted_comm.c 172
[singularity:15041] [[35712,0],0] orted_recv_cmd: reissued recv
[singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by 
[[35712,1],0] for tag 1
[singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing 
Command: ORTE_DAEMON_SYNC_WANT_NIDMAP
[singularity:15041] [[35712,0],0] orte:daemon:cmd:processor: processing 
commands completed
[singularity:15042] OPAL dss:unpack: got type 33 when expecting type 12
[singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
../../../orte/util/nidmap.c at line 429
[singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
[singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
[singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
../../../orte/runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  -->  Returned "Pack data mismatch" (-22) instead of "Success" (0)
--------------------------------------------------------------------------
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.


Reply via email to