I've built Molpro2002.6 on our PC cluster:
8 nodes, each is 2-CPU Pentium
RedHat 9
I'm using GA3.2.6 (built with ARMCI_NETWORK=SOCKETS and tested OK with all
processors), Intel ifc7.1, and mpich-1.2.5.
It runs fine with -n1, and also with -n2 as long as both processes are on one
node.
When I try to run on multiple nodes, e.g. -n4, the processes start okay, I can
see 2 processes on each of 2 nodes. It does some output, then gets a file
header error. The processes remain and must be killed.
Here is the start and end of the output file (h2o_vdz.out):
----------------------------------------------------------------------------------
1 ARMCI configured for 2 cluster nodes
2
3 MPP nodes nproc
4 r2d2 2
5 obiwan 2
6 ga_uses_ma=false, calling ma_init with nominal heap. Any -G option
will be ignored.
7
8 Primary working directories: /tmp/molpro
9 Secondary working directories: /tmp/molpro
...
etc.
...
168 Variable memory set to 1000000 words, buffer space 230000 words
169
170
171
172 Using spherical harmonics
173
174 Bad seek in iow_direct_write; fd=-1, p=4096
175 Bad seek in iow_direct_write; fd=-1, p=4096
176 -10000(s):armci_rcv_req: failed to receive header : 2
177 0:Child process terminated prematurely, status=: 256
178 Bad seek in iow_direct_write; fd=-1, p=4135
179 -10002(s):armci_rcv_req: failed to receive header : 2
180 Bad seek in iow_direct_write; fd=-1, p=4135
----------------------------------------------------------------------------------
Is this a problem with how the cluster is configured, how mpich
is configured, or how Molpro is configured? Or something else?
Any help would be appreciated.
Karen Haskell
[EMAIL PROTECTED]