I don't think you need to .ompi_ignore all those components. First, you need to use the --without-hwloc option (you misspelled it below as --disable-hwloc).
Assuming you removed the relevant code from your clone of the default odls module, I suspect the calls are being made in ompi/runtime/ompi_mpi_init.c. If the process detects it isn't bound, it looks to see if it should bind itself. I thought that code was also turned "off" if we configured without-hwloc, so you might have to check it. Shared memory is a separate issue. If you want/need to avoid it, then run with -mca btl ^sm and this will turn off all shared memory calls. On Mar 17, 2012, at 11:51 AM, Alex Margolin wrote: > Hi, > > I want to launch Open-MPI processes using another process: instead of using > "hello" x 4 I want to run "mosrun -w hello" x 4 when I start it with "mpirun > -n 4 hello". I've cloned the "default" component in orte/mca/odls (from > trunk) - see patch attached. > > I'm getting an error which is related to mosrun, but I want to configure > OpenMPI to avoid it. I'm running on my laptop ("singularity"), which is the > only node. > I suspect my error (full output at the bottom) is caused by the following > lines, indicating system-calls invoked which are not supported by mosrun: > > MOSRUN: system-call 'sched_getaffinity' not supported under MOSIX > MOSRUN: Shared memory (MAP_SHARED) not supported under MOSIX > > As the lines state, both the sched_getaffinity() syscall (and the likes) and > mmap with MAP_SHARED are not supported. I've tried to find all the relevant > instances in the Open-MPI code and disable them, but to no avail: > > alex@singularity:~/huji/openmpi-trunk$ find . -name .ompi_ignore > ./opal/mca/shmem/mmap/.ompi_ignore > ./opal/mca/shmem/posix/.ompi_ignore > ./opal/mca/hwloc/hwloc132/.ompi_ignore > ./opal/mca/timer/altix/.ompi_ignore > ./opal/mca/memory/linux/.ompi_ignore > ./orte/mca/plm/xgrid/.ompi_ignore > ./orte/mca/plm/submit/.ompi_ignore > ./orte/mca/sensor/heartbeat/.ompi_ignore > ./ompi/mca/fs/lustre/.ompi_ignore > ./ompi/mca/rcache/rb/.ompi_ignore > ./ompi/mca/coll/sm/.ompi_ignore > ./ompi/mca/coll/demo/.ompi_ignore > ./ompi/mca/pml/example/.ompi_ignore > ./ompi/mca/op/x86/.ompi_ignore > ./ompi/mca/op/example/.ompi_ignore > ./ompi/mca/btl/sm/.ompi_ignore > ./ompi/mca/btl/template/.ompi_ignore > ./ompi/mca/mpool/sm/.ompi_ignore > ./ompi/mca/common/sm/.ompi_ignore > ./ompi/mca/vprotocol/example/.ompi_ignore > alex@singularity:~/huji/openmpi-trunk$ cat command > ./autogen.sh ; ./configure CFLAGS=-m64 CXXFLAGS=-m64 > --prefix=/home/alex/huji/ompit --disable-hwloc --disable-mmap-shmem > --disable-posix-shmem --disable-sysv-shmem > --enable-mca-no-build=maffinity,paffinity ; make ; make install > alex@singularity:~/huji/openmpi-trunk$ > > Can anyone help me determine where is the code calling these system calls and > disable it? Or maybe it is another, unrelated problem? > The attached module is part of a system I'm building (along with the BTL > module I've mentioned in the past - still working on it...) in hope of > contributing to the Open-MPI community upon completion. > > Thanks a lot, > Alex > > P.S. Here is the full output of the error: > > alex@singularity:~/huji/benchmarks/simple$ ~/huji/ompit/bin/mpirun -mca > orte_debug 100 -n 1 hello > [singularity:15041] mca: base: component_find: unable to open > /home/alex/huji/ompit/lib/openmpi/mca_paffinity_hwloc: > /home/alex/huji/ompit/lib/openmpi/mca_paffinity_hwloc.so: undefined symbol: > opal_hwloc_topology (ignored) > [singularity:15041] mca: base: component_find: unable to open > /home/alex/huji/ompit/lib/openmpi/mca_rmaps_rank_file: > /home/alex/huji/ompit/lib/openmpi/mca_rmaps_rank_file.so: undefined symbol: > opal_hwloc_binding_policy (ignored) > [singularity:15041] procdir: > /tmp/openmpi-sessions-alex@singularity_0/35712/0/0 > [singularity:15041] jobdir: /tmp/openmpi-sessions-alex@singularity_0/35712/0 > [singularity:15041] top: openmpi-sessions-alex@singularity_0 > [singularity:15041] tmp: /tmp > [singularity:15041] mpirun: reset PATH: > /home/alex/huji/ompit/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/default-java/bin::/usr/local/apache-maven-3.0.3/bin > [singularity:15041] mpirun: reset LD_LIBRARY_PATH: /home/alex/huji/ompit/lib > [singularity:15041] [[35712,0],0] hostfile: checking hostfile > /home/alex/huji/ompit/etc/openmpi-default-hostfile for nodes > [singularity:15041] [[35712,0],0] hostfile: filtering nodes through hostfile > /home/alex/huji/ompit/etc/openmpi-default-hostfile > [singularity:15041] defining message event: grpcomm_bad_module.c 165 > [singularity:15041] progressed_wait: base/plm_base_launch_support.c 297 > [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by > [[35712,0],0] for tag 1 > [singularity:15041] [[35712,0],0] orte:daemon:send_relay > [singularity:15041] [[35712,0],0] orte:daemon:send_relay - recipient list is > empty! > [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing > Command: ORTE_DAEMON_ADD_LOCAL_PROCS > MPIR_being_debugged = 0 > MPIR_debug_state = 1 > MPIR_partial_attach_ok = 1 > MPIR_i_am_starter = 0 > MPIR_forward_output = 0 > MPIR_proctable_size = 1 > MPIR_proctable: > (i, host, exe, pid) = (0, singularity, > /home/alex/huji/benchmarks/simple/hello, 15042) > MPIR_executable_path: NULL > MPIR_server_arguments: NULL > > MOSRUN: system-call 'sched_getaffinity' not supported under MOSIX > MOSRUN: Shared memory (MAP_SHARED) not supported under MOSIX > > [singularity:15042] procdir: > /tmp/openmpi-sessions-alex@singularity_0/35712/1/0 > [singularity:15042] jobdir: /tmp/openmpi-sessions-alex@singularity_0/35712/1 > [singularity:15042] top: openmpi-sessions-alex@singularity_0 > [singularity:15042] tmp: /tmp > [singularity:15041] [[35712,0],0] orted_recv_cmd: received message from > [[35712,1],0] > [singularity:15041] defining message event: orted/orted_comm.c 172 > [singularity:15041] [[35712,0],0] orted_recv_cmd: reissued recv > [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by > [[35712,1],0] for tag 1 > [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing > Command: ORTE_DAEMON_SYNC_WANT_NIDMAP > [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor: processing > commands completed > [singularity:15042] OPAL dss:unpack: got type 33 when expecting type 12 > [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file > ../../../orte/util/nidmap.c at line 429 > [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file > ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62 > [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file > ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173 > [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file > ../../../orte/runtime/orte_init.c at line 132 > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: orte_init failed > --> Returned "Pack data mismatch" (-22) instead of "Success" (0) > -------------------------------------------------------------------------- > *** The MPI_Init() function was called before MPI_INIT was invoked. > *** This is disallowed by the MPI standard. > *** Your MPI job will now abort. > [singularity:15042] Abort before MPI_INIT completed successfully; not able to > guarantee that all other processes were killed! > [singularity:15041] defining message event: iof_hnp_read.c 293 > [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by > [[35712,0],0] for tag 1 > [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing > Command: ORTE_DAEMON_IOF_COMPLETE > [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor: processing > commands completed > [singularity:15041] defining message event: base/odls_base_default_fns.c 2532 > [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by > [[35712,0],0] for tag 1 > [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing > Command: ORTE_DAEMON_WAITPID_FIRED > [singularity:15041] sess_dir_finalize: proc session dir not empty - leaving > [singularity:15041] [[35712,0],0]:errmgr_default_hnp.c(948) updating exit > status to 1 > ------------------------------------------------------- > While the primary job terminated normally, 1 process returned > a non-zero exit code.. Further examination may be required. > ------------------------------------------------------- > [singularity:15041] sess_dir_finalize: job session dir not empty - leaving > [singularity:15041] [[35712,0],0] Releasing job data for [35712,0] > [singularity:15041] [[35712,0],0] Releasing job data for [35712,1] > [singularity:15041] sess_dir_finalize: proc session dir not empty - leaving > orterun: exiting with status 1 > alex@singularity:~/huji/benchmarks/simple$ > <odls_mosix.diff>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel