I don't think you need to .ompi_ignore all those components. First, you need to 
use the --without-hwloc option (you misspelled it below as --disable-hwloc).

Assuming you removed the relevant code from your clone of the default odls 
module, I suspect the calls are being made in ompi/runtime/ompi_mpi_init.c. If 
the process detects it isn't bound, it looks to see if it should bind itself. I 
thought that code was also turned "off" if we configured without-hwloc, so you 
might have to check it.

Shared memory is a separate issue. If you want/need to avoid it, then run with 
-mca btl ^sm and this will turn off all shared memory calls.


On Mar 17, 2012, at 11:51 AM, Alex Margolin wrote:

> Hi,
> 
> I want to launch Open-MPI processes using another process: instead of using 
> "hello" x 4 I want to run "mosrun -w hello" x 4 when I start it with "mpirun 
> -n 4 hello". I've cloned the "default" component in orte/mca/odls (from 
> trunk) - see patch attached.
> 
> I'm getting an error which is related to mosrun, but I want to configure 
> OpenMPI to avoid it. I'm running on my laptop ("singularity"), which is the 
> only node.
> I suspect my error (full output at the bottom) is caused by the following 
> lines, indicating system-calls invoked which are not supported by mosrun:
> 
> MOSRUN: system-call 'sched_getaffinity' not supported under MOSIX
> MOSRUN: Shared memory (MAP_SHARED) not supported under MOSIX
> 
> As the lines state, both the sched_getaffinity() syscall (and the likes) and 
> mmap with MAP_SHARED are not supported. I've tried to find all the relevant 
> instances in the Open-MPI code and disable them, but to no avail:
> 
> alex@singularity:~/huji/openmpi-trunk$ find . -name .ompi_ignore
> ./opal/mca/shmem/mmap/.ompi_ignore
> ./opal/mca/shmem/posix/.ompi_ignore
> ./opal/mca/hwloc/hwloc132/.ompi_ignore
> ./opal/mca/timer/altix/.ompi_ignore
> ./opal/mca/memory/linux/.ompi_ignore
> ./orte/mca/plm/xgrid/.ompi_ignore
> ./orte/mca/plm/submit/.ompi_ignore
> ./orte/mca/sensor/heartbeat/.ompi_ignore
> ./ompi/mca/fs/lustre/.ompi_ignore
> ./ompi/mca/rcache/rb/.ompi_ignore
> ./ompi/mca/coll/sm/.ompi_ignore
> ./ompi/mca/coll/demo/.ompi_ignore
> ./ompi/mca/pml/example/.ompi_ignore
> ./ompi/mca/op/x86/.ompi_ignore
> ./ompi/mca/op/example/.ompi_ignore
> ./ompi/mca/btl/sm/.ompi_ignore
> ./ompi/mca/btl/template/.ompi_ignore
> ./ompi/mca/mpool/sm/.ompi_ignore
> ./ompi/mca/common/sm/.ompi_ignore
> ./ompi/mca/vprotocol/example/.ompi_ignore
> alex@singularity:~/huji/openmpi-trunk$ cat command
> ./autogen.sh ; ./configure CFLAGS=-m64 CXXFLAGS=-m64 
> --prefix=/home/alex/huji/ompit --disable-hwloc --disable-mmap-shmem 
> --disable-posix-shmem --disable-sysv-shmem 
> --enable-mca-no-build=maffinity,paffinity ; make ; make install
> alex@singularity:~/huji/openmpi-trunk$
> 
> Can anyone help me determine where is the code calling these system calls and 
> disable it? Or maybe it is another, unrelated problem?
> The attached module is part of a system I'm building (along with the BTL 
> module I've mentioned in the past - still working on it...) in hope of 
> contributing to the Open-MPI community upon completion.
> 
> Thanks a lot,
> Alex
> 
> P.S. Here is the full output of the error:
> 
> alex@singularity:~/huji/benchmarks/simple$ ~/huji/ompit/bin/mpirun -mca 
> orte_debug 100 -n 1 hello
> [singularity:15041] mca: base: component_find: unable to open 
> /home/alex/huji/ompit/lib/openmpi/mca_paffinity_hwloc: 
> /home/alex/huji/ompit/lib/openmpi/mca_paffinity_hwloc.so: undefined symbol: 
> opal_hwloc_topology (ignored)
> [singularity:15041] mca: base: component_find: unable to open 
> /home/alex/huji/ompit/lib/openmpi/mca_rmaps_rank_file: 
> /home/alex/huji/ompit/lib/openmpi/mca_rmaps_rank_file.so: undefined symbol: 
> opal_hwloc_binding_policy (ignored)
> [singularity:15041] procdir: 
> /tmp/openmpi-sessions-alex@singularity_0/35712/0/0
> [singularity:15041] jobdir: /tmp/openmpi-sessions-alex@singularity_0/35712/0
> [singularity:15041] top: openmpi-sessions-alex@singularity_0
> [singularity:15041] tmp: /tmp
> [singularity:15041] mpirun: reset PATH: 
> /home/alex/huji/ompit/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin
> [singularity:15041] mpirun: reset LD_LIBRARY_PATH: /home/alex/huji/ompit/lib
> [singularity:15041] [[35712,0],0] hostfile: checking hostfile 
> /home/alex/huji/ompit/etc/openmpi-default-hostfile for nodes
> [singularity:15041] [[35712,0],0] hostfile: filtering nodes through hostfile 
> /home/alex/huji/ompit/etc/openmpi-default-hostfile
> [singularity:15041] defining message event: grpcomm_bad_module.c 165
> [singularity:15041] progressed_wait: base/plm_base_launch_support.c 297
> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by 
> [[35712,0],0] for tag 1
> [singularity:15041] [[35712,0],0] orte:daemon:send_relay
> [singularity:15041] [[35712,0],0] orte:daemon:send_relay - recipient list is 
> empty!
> [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing 
> Command: ORTE_DAEMON_ADD_LOCAL_PROCS
>  MPIR_being_debugged = 0
>  MPIR_debug_state = 1
>  MPIR_partial_attach_ok = 1
>  MPIR_i_am_starter = 0
>  MPIR_forward_output = 0
>  MPIR_proctable_size = 1
>  MPIR_proctable:
>    (i, host, exe, pid) = (0, singularity, 
> /home/alex/huji/benchmarks/simple/hello, 15042)
> MPIR_executable_path: NULL
> MPIR_server_arguments: NULL
> 
> MOSRUN: system-call 'sched_getaffinity' not supported under MOSIX
> MOSRUN: Shared memory (MAP_SHARED) not supported under MOSIX
> 
> [singularity:15042] procdir: 
> /tmp/openmpi-sessions-alex@singularity_0/35712/1/0
> [singularity:15042] jobdir: /tmp/openmpi-sessions-alex@singularity_0/35712/1
> [singularity:15042] top: openmpi-sessions-alex@singularity_0
> [singularity:15042] tmp: /tmp
> [singularity:15041] [[35712,0],0] orted_recv_cmd: received message from 
> [[35712,1],0]
> [singularity:15041] defining message event: orted/orted_comm.c 172
> [singularity:15041] [[35712,0],0] orted_recv_cmd: reissued recv
> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by 
> [[35712,1],0] for tag 1
> [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing 
> Command: ORTE_DAEMON_SYNC_WANT_NIDMAP
> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor: processing 
> commands completed
> [singularity:15042] OPAL dss:unpack: got type 33 when expecting type 12
> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../orte/util/nidmap.c at line 429
> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../orte/runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>  ompi_mpi_init: orte_init failed
>  --> Returned "Pack data mismatch" (-22) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [singularity:15042] Abort before MPI_INIT completed successfully; not able to 
> guarantee that all other processes were killed!
> [singularity:15041] defining message event: iof_hnp_read.c 293
> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by 
> [[35712,0],0] for tag 1
> [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing 
> Command: ORTE_DAEMON_IOF_COMPLETE
> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor: processing 
> commands completed
> [singularity:15041] defining message event: base/odls_base_default_fns.c 2532
> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by 
> [[35712,0],0] for tag 1
> [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing 
> Command: ORTE_DAEMON_WAITPID_FIRED
> [singularity:15041] sess_dir_finalize: proc session dir not empty - leaving
> [singularity:15041] [[35712,0],0]:errmgr_default_hnp.c(948) updating exit 
> status to 1
> -------------------------------------------------------
> While the primary job  terminated normally, 1 process returned
> a non-zero exit code.. Further examination may be required.
> -------------------------------------------------------
> [singularity:15041] sess_dir_finalize: job session dir not empty - leaving
> [singularity:15041] [[35712,0],0] Releasing job data for [35712,0]
> [singularity:15041] [[35712,0],0] Releasing job data for [35712,1]
> [singularity:15041] sess_dir_finalize: proc session dir not empty - leaving
> orterun: exiting with status 1
> alex@singularity:~/huji/benchmarks/simple$
> <odls_mosix.diff>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to