Been chatting off-list with the SGE folks - can you tell us what version of SGE 
you are using?


On Jul 26, 2012, at 9:02 AM, Christoph van Wüllen wrote:

> It is a long-standing problem that due to a bug in Sun GridEngine
> (setting the stack size limit equal to the address space limit)
> using qrsh from within OpenMPI fails if a large memory is requested
> but the stack size not explicitly set to a reasonably small value.
> 
> The best solution were if SGE just would not touch the stack
> size limit and leave it at INFINITY.
> 
> However I have tested that just reducing the stack size limit in
> file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child()  before
> execv'ing qrsh circumvents the problem,  so just after exec_patch is set
> by strdup(...)   I inserted the lines
> 
>   {
>   struct rlimit rlim;
>   int l;
> 
>   l=strlen(exec_path);
>   if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) {
>     getrlimit(RLIMIT_STACK, &rlim);
>     if (rlim.rlim_max > 10000000L) rlim.rlim_max=10000000L;
>     if (rlim.rlim_cur > 10000000L) rlim.rlim_cur=10000000L;
>     setrlimit(RLIMIT_STACK, &rlim);
>   }
>   }
> 
> 
> It looks quick-and-dirty and it certainly is, but it solves a severe
> problem many users have with OpenMPI and SGE. Feel free to use this
> information as you like. Note that MPI worker jobs eventually
> spawned off on "distant" nodes do not suffer from the reduced stack
> size limit, it is only the qrsh command.
> 
> Is this (still) of interest?
> 
> +---------------------------------+----------------------------------+
> | Prof. Christoph van Wüllen      | Tele-Phone (+49) (0)631 205 2749 |
> | TU Kaiserslautern, FB Chemie    | Tele-Fax   (+49) (0)631 205 2750 |
> | Erwin-Schrödinger-Str.          |                                  |
> | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de       |
> |                                                                    |
> | HomePage:  http://www.chemie.uni-kl.de/vanwullen                   |
> +---------------------------------+----------------------------------+
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to