System; two dual-opteron, amd64 etch, 16GB ram, raid1. Heavy computation (memory either 1750 mb or 3750 mb per node) started with high % cpu usage and little usage of memory. Then, the two factors inverted, the HD led became lighted without interruption. The computation then closed "incomplete" with the warning message in the output file;
******************* ARMCI INFO ************************ The application attempted to allocate a shared memory segment of 38731776 bytes in size. This might be in addition to segments that were allocated succesfully previously. The current system configuration does not allow enough shared memory to be allocated to the application. This is most often caused by: 1) system parameter SHMMAX (largest shared memory segment) being too small or 2) insufficient swap space. Please ask your system administrator to verify if SHMMAX matches the amount of memory needed by your application and the system has sufficient amount of swap space. Most UNIX systems can be easily reconfigured to allow larger shared memory segments, see http://www.emsl.pnl.gov/docs/global/support.html In some cases, the problem might be caused by insufficient swap space. ******************************************************* 0:allocate: failed to create shared region : -1 0:allocate: failed to create shared region : -1 Last System Error Message from Task 0:: Invalid argument 0: ARMCI aborting -1 (0xffffffffffffffff). 0: ARMCI aborting -1 (0xffffffffffffffff). system error message: Invalid argument 3:SigIntHandler: interrupt signal was caught: 2 3:SigIntHandler: interrupt signal was caught: 2 Last System Error Message from Task 3:: No such file or directory 3: ARMCI aborting 2 (0x2). 3: ARMCI aborting 2 (0x2). system error message: No such file or directory 1:SigIntHandler: interrupt signal was caught: 2 1:SigIntHandler: interrupt signal was caught: 2 Last System Error Message from Task 1:: No such file or directory 1: ARMCI aborting 2 (0x2). 1: ARMCI aborting 2 (0x2). system error message: No such file or directory 2:SigIntHandler: interrupt signal was caught: 2 2:SigIntHandler: interrupt signal was caught: 2 Last System Error Message from Task 2:: No such file or directory 2: ARMCI aborting 2 (0x2). 2: ARMCI aborting 2 (0x2). system error message: No such file or directory Creating: host=deb64, user=francesco, file=/home/francesco/nwchem50/bin/nwchem, port=57429 4: interrupt(1) WaitAll: No children or error in wait? The http suggested above is of no help, referring to Linux kernel 2) This warning message was exaclty the same with the two different mem allocations. With smaller matrices (ie smaller molecules) the same type of computation ends OK. Thanks for suggestions how to tune the system. francesco pietra ____________________________________________________________________________________ Looking for earth-friendly autos? Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. http://autos.yahoo.com/green_center/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]