Re: [OMPI devel] Too many open files (24)

2011-03-30 Thread Timothy Stitt
Dear Samuel,

Just as you replied I was trying that on the compute nodes. Surprise, 
surprise...the value returned as the hard and soft limits is 1024.

Thanks for confirming my suspicions...

Regards,

Tim.

On Mar 30, 2011, at 7:41 PM, Samuel K. Gutierrez wrote:

Hi,

It sounds like Open MPI is hitting your system's open file descriptor limit.  
If that's the case, one potential workaround is to have your system 
administrator raise file descriptor limits.

On a compute node, what does "ulimit -a" show (using bash)?

Hope that helps,

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Mar 30, 2011, at 5:22 PM, Timothy Stitt wrote:

Dear OpenMPI developers,

One of our users was running a benchmark on a 1032 core simulation. He had a 
successful run at 900 cores but when he stepped up to 1032 cores the job just 
stalled and his logs contained many occurrences of the following line:

[d6copt368.crc.nd.edu][[25621,1],0][btl_tcp_component.c:885:mca_btl_tcp_component_accept_handler]
 accept() failed: Too many open files (24)

The simulation has a single master task that communicates with all the other 
tasks to write out some I/O via the master. We are assuming the message is 
related to this bottleneck. Is there a 1024 limit on the number of open 
files/connections for instance?

Can anyone confirm the meaning of this error and secondly provide a resolution 
that hopefully doesn't involve a code rewrite.

Thanks in advance,

Tim.

Tim Stitt PhD (User Support Manager).
Center for Research Computing | University of Notre Dame |
P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email: 
tst...@nd.edu<mailto:tst...@nd.edu>

___
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Tim Stitt PhD (User Support Manager).
Center for Research Computing | University of Notre Dame |
P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email: 
tst...@nd.edu<mailto:tst...@nd.edu>



[OMPI devel] Too many open files (24)

2011-03-30 Thread Timothy Stitt
Dear OpenMPI developers,

One of our users was running a benchmark on a 1032 core simulation. He had a 
successful run at 900 cores but when he stepped up to 1032 cores the job just 
stalled and his logs contained many occurrences of the following line:

[d6copt368.crc.nd.edu][[25621,1],0][btl_tcp_component.c:885:mca_btl_tcp_component_accept_handler]
 accept() failed: Too many open files (24)

The simulation has a single master task that communicates with all the other 
tasks to write out some I/O via the master. We are assuming the message is 
related to this bottleneck. Is there a 1024 limit on the number of open 
files/connections for instance?

Can anyone confirm the meaning of this error and secondly provide a resolution 
that hopefully doesn't involve a code rewrite.

Thanks in advance,

Tim.

Tim Stitt PhD (User Support Manager).
Center for Research Computing | University of Notre Dame |
P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email: 
tst...@nd.edu