I've built Molpro2002.6 on our PC cluster:
     8 nodes, each is 2-CPU Pentium
     RedHat 9

I'm using GA3.2.6 (built with ARMCI_NETWORK=SOCKETS and tested OK with all 
processors),  Intel ifc7.1, and  mpich-1.2.5.

It runs fine with -n1, and also with -n2 as long as both processes are on one 
node.

When I try to run on multiple nodes, e.g. -n4, the processes start okay, I can 
see 2 processes on each of 2 nodes.  It does some output, then gets a file 
header error.  The processes remain and must be killed. 

Here is the start and end of the output file (h2o_vdz.out):
----------------------------------------------------------------------------------

     1 ARMCI configured for 2 cluster nodes
      2
      3  MPP nodes  nproc
      4  r2d2         2
      5  obiwan       2
      6  ga_uses_ma=false, calling ma_init with nominal heap. Any -G option 
will be ignored.
      7
      8  Primary working directories:    /tmp/molpro
      9  Secondary working directories:  /tmp/molpro
       ...
       etc.
       ...
    168  Variable memory set to    1000000 words,  buffer space   230000 words
    169
    170
    171
    172  Using spherical harmonics
    173
    174 Bad seek in iow_direct_write; fd=-1, p=4096
    175 Bad seek in iow_direct_write; fd=-1, p=4096
    176 -10000(s):armci_rcv_req: failed to receive header : 2
    177 0:Child process terminated prematurely, status=: 256
    178 Bad seek in iow_direct_write; fd=-1, p=4135
    179 -10002(s):armci_rcv_req: failed to receive header : 2
    180 Bad seek in iow_direct_write; fd=-1, p=4135
----------------------------------------------------------------------------------

Is this a problem with how the cluster is configured, how mpich 
is configured, or how Molpro is configured?   Or something else?

Any help would be appreciated.

Karen Haskell
[EMAIL PROTECTED]
     

Reply via email to