Re: [Beowulf] hang-up of HPC Challenge

Mikhail Kuzminsky Wed, 20 Aug 2008 11:03:22 -0700

In message from Greg Lindahl <[EMAIL PROTECTED]> (Tue, 19 Aug 200819:39:38 -0700):

On Wed, Aug 20, 2008 at 03:45:43AM +0400, Mikhail Kuzminsky wrote:
For some localization of possible problem reason, I ran pure HPLtestinstead of HPCC. HPL performs direct output to screen instead ofwritingto the file.
Using MPICH w/np=8 I obtained normal HPL result for N=35000 -including3 "PASSED" strings for ||Ax-b|| calculations. BUT ! Linux hang-upsimmediately after output of this strings.
Well, what did your configuration file tell HPL to do? Does it have
another test, perhaps a bigger one, or is it supposed to exit? We
aren't mind-readers.

Pls sorry: I performed now 2 HPL run cases for the same N=10000,(1st) - "single" HPL run, i.e. ONE N=10000, ONE blocksize value, andONE any other HPL.dat parameter.

(2nd) - "multiple" HPL run w/same (one) N=10000 and blocksize=100, butwith a sets of PFACTS etc (see the output below).

1st run finished successfully, 2nd lead to Linux hang-up.

Yours

Mikhail

"single" HPL run :

HPLinpack 1.0a -- High-Performance Linpack benchmark -- January20, 2004Written by A. Petitet and R. Clint Whaley, Innovative ComputingLabs., UTK

============================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000
NB     :     100
PMAP   : Row-major process mapping
P      :       2
Q      :       4
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 16 double precision words

----------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
   1) ||Ax-b||_oo / ( eps * ||A||_1  * N        )
   2) ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  )
   3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )

- The relative machine precision (eps) is taken to be1.110223e-16- Computational tests pass if scaled residuals are less than16.0


============================================================================

T/V N NB P Q TimeGflops

----------------------------------------------------------------------------

WR11C2R4 10000 100 2 4 23.322.859e+01

----------------------------------------------------------------------------

||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0767386 ......PASSED||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0181586 ......PASSED||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0040588 ......PASSED

============================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
----------------------------------------------------------------------------

End of Tests.
============================================================================
[1]+  Done                    mpirun -np 8 xhpl

"multiple" HPL run:

HPLinpack 1.0a -- High-Performance Linpack benchmark -- January20, 2004Written by A. Petitet and R. Clint Whaley, Innovative ComputingLabs., UTK

============================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000
NB     :     100
PMAP   : Row-major process mapping
P      :       2
Q      :       4
PFACT  :    Left    Crout    Right
NBMIN  :       2        4
NDIV   :       2
RFACT  :    Left    Crout    Right
BCAST  :   1ring
DEPTH  :       0
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 16 double precision words

----------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
   1) ||Ax-b||_oo / ( eps * ||A||_1  * N        )
   2) ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  )
   3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )

- The relative machine precision (eps) is taken to be1.110223e-16- Computational tests pass if scaled residuals are less than16.0


============================================================================

T/V N NB P Q TimeGflops

----------------------------------------------------------------------------

WR00L2L2 10000 100 2 4 23.022.897e+01

----------------------------------------------------------------------------

||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0980967 ......PASSED||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0232126 ......PASSED||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0051885 ......PASSED

============================================================================

T/V N NB P Q TimeGflops

----------------------------------------------------------------------------

WR00L2L4 10000 100 2 4 22.972.903e+01

----------------------------------------------------------------------------

||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0832258 ......PASSED||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0196937 ......PASSED||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0044019 ......PASSED

============================================================================

T/V N NB P Q TimeGflops

----------------------------------------------------------------------------

WR00L2C2 10000 100 2 4 22.952.905e+01

----------------------------------------------------------------------------

||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0980967 ......PASSED||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0232126 ......PASSED||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0051885 ......PASSED


... and here Linux hangs ...

-- greg


_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] hang-up of HPC Challenge

Reply via email to