Hi Brian,
   I got another problem in running an MPI job through XGrid. During the
execution of this MPI job it will call Xlib functions (i.e. XOpenDisplay())
to open an X window.  The XOpenDisplay() function call failed (return
"null"), it can not open a display no matter how many processors that I
requested.

However, when I tuned off the xgrid controller, I used "mpirun -n 4 " to
start the job again, four X windows opened properly, but four processes were
all running on the local machine instead of on any remote nodes.

I have also tested to use "ssh -x" from a terminal of my local machine to
login to any other node in the cluster  to run the job (I have the copies of
the same job on all nodes and in the same path), the X window can display on
my local machine  properly. I know it is "-x" option set up the environment
properly for starting the xwindow. If only use "ssh" without "-x" option, it
won't work.

I am wondering why the xwindow can not open if the job is started through
Xgrid.  How does the Xgrid controller contact to each agent node?

Is there anyone who has seen a similar problem?

I have installed X11 and OpenMPI on all 8 mac mini nodes in my cluster, and
have also tested running an  MPI job,  which  has no X window function
calls, through XGrid, it worked perfectly fine on all nodes.

Thanks a lot for any suggestions!

Jane

Reply via email to