On Oct 30, 2012, at 11:55 AM, Sandra Guija <sgu...@hotmail.com> wrote:

> I am able to change the memory size parameters, so if I increase memory size 
> (currently 2 gb) or add caches, it could be a solution?

could be

> or is the program that is using too much memory?

Hard to tell. In the case you show, we are aborting because we don't see enough 
memory to support the shared memory system. You can adjust that size by setting 
the MCA params for shared memory - see "ompi_info --param btl sm".

On the other hand, your program is clearly huge. 10k x 10k = 100M entries, so 
you are using close to a Gbyte (assuming doubles) just to store the array in 
one process.


> 
> thanks really for you input, I appreciate it.
> 
> Sandra Guija
> 
> From: r...@open-mpi.org
> Date: Tue, 30 Oct 2012 11:50:28 -0700
> To: de...@open-mpi.org
> Subject: Re: [OMPI devel] process kill signal 59
> 
> Yeah, you're using too much memory for the shared memory system. Run with 
> -mca btl ^sm on your cmd line - it'll run slower, but you probably don't have 
> a choice.
> 
> 
> On Oct 30, 2012, at 11:38 AM, Sandra Guija <sgu...@hotmail.com> wrote:
> 
> yes I think is related with my program too, when I run 1000x1000 matrix 
> multiplication, the program works.
> when I run the 10,000 matrix only on one machine  I got this:
> mca_common_sm_mmap_init: mmap failed with errno=12
> mca_mpool_sm_init: unable to shared memory mapping ( 
> /tmp/openmpi-sessions-mpiu@tango_0/default-universe-1529/1/shared_mem_pool 
> .tango)
> mca_common_sm_mmap_init: 
> /tmp/openmpi-sessions-mpiu@tango_0/default-universe-1529/1/shared_mem_pool 
> .tango failed with errno=2
> mca_mpool_sm_init: unable to shared memory mapping ( 
> /tmp/openmpi-sessions-mpiu@tango_0/default-universe-1529/1/shared_mem_pool 
> .tango)
> PML add procs failed
> -->Returned "0ut of resource" (-2) instead of " Success" (0)
> 
> this is the result when I run free -m
>                   total   used   free   shared  buffers   cached
> Mem:          2026    54    1972      0         6           25
> -/+ buffer cache:    22      511      
>  Swap:         511     0       511
> 
> Sandra Guija
> 
> From: r...@open-mpi.org
> Date: Tue, 30 Oct 2012 10:33:02 -0700
> To: de...@open-mpi.org
> Subject: Re: [OMPI devel] process kill signal 59
> 
> Ummm...not sure what I can say about that with so little info. It looks like 
> your process died for some reason that has nothing to do with us - a bug in 
> your "magic10000" program?
> 
> 
> On Oct 30, 2012, at 10:24 AM, Sandra Guija <sgu...@hotmail.com> wrote:
> 
> Hello, 
> I am running a 10,000x10,000 matrix multiplication in 4 processors/1 core and 
> I get the following error:
> mpirun -np 4 --hostfile nodes --bynode magic10000
> 
> mpirun noticed that job rank1 with PID 635 on node slave1 exited on signal 
> 509(Real-time signal 25).
> 2 additional process aborted (not shown)
> 1 process killed (possibly by open MPI)
> 
> node file contains:
> master
> slave1
> slave2
> slave3
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________ devel mailing list 
> de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________ devel mailing list 
> de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to