Been chatting off-list with the SGE folks - can you tell us what version of SGE
you are using?
On Jul 26, 2012, at 9:02 AM, Christoph van Wüllen wrote:
> It is a long-standing problem that due to a bug in Sun GridEngine
> (setting the stack size limit equal to the address space limit)
> using qrsh from within OpenMPI fails if a large memory is requested
> but the stack size not explicitly set to a reasonably small value.
>
> The best solution were if SGE just would not touch the stack
> size limit and leave it at INFINITY.
>
> However I have tested that just reducing the stack size limit in
> file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child() before
> execv'ing qrsh circumvents the problem, so just after exec_patch is set
> by strdup(...) I inserted the lines
>
> {
> struct rlimit rlim;
> int l;
>
> l=strlen(exec_path);
> if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) {
> getrlimit(RLIMIT_STACK, &rlim);
> if (rlim.rlim_max > 10000000L) rlim.rlim_max=10000000L;
> if (rlim.rlim_cur > 10000000L) rlim.rlim_cur=10000000L;
> setrlimit(RLIMIT_STACK, &rlim);
> }
> }
>
>
> It looks quick-and-dirty and it certainly is, but it solves a severe
> problem many users have with OpenMPI and SGE. Feel free to use this
> information as you like. Note that MPI worker jobs eventually
> spawned off on "distant" nodes do not suffer from the reduced stack
> size limit, it is only the qrsh command.
>
> Is this (still) of interest?
>
> +---------------------------------+----------------------------------+
> | Prof. Christoph van Wüllen | Tele-Phone (+49) (0)631 205 2749 |
> | TU Kaiserslautern, FB Chemie | Tele-Fax (+49) (0)631 205 2750 |
> | Erwin-Schrödinger-Str. | |
> | D-67663 Kaiserslautern, Germany | [email protected] |
> | |
> | HomePage: http://www.chemie.uni-kl.de/vanwullen |
> +---------------------------------+----------------------------------+
>
>
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel