I recently peformed solved a linear system of very high dimension distributed over 32 Mac XServe's. I was rather surprised by the performance statistics it reported, given below. In particular, how can VecNorm be so much more expensive than VecDot, since VecNorm should simply involve taking a single square root of a dot product.
--- Event Stage 2: LinearSolve MatMult 19 1.0 1.8057e+01 1.7 1.19e+08 1.7 1.9e+04 5.5e+04 0.0e+00 2 17 49 49 0 16 17 50 50 0 2214 MatMultTranspose 19 1.0 1.6234e+01 2.1 1.73e+08 2.1 1.9e+04 5.5e+04 0.0e+00 2 18 49 49 0 11 18 50 50 0 2601 MatSolve 20 1.0 1.2656e+01 1.5 1.45e+08 1.5 0.0e+00 0.0e+00 0.0e+00 1 18 0 0 0 10 18 0 0 0 3200 MatSolveTranspos 20 1.0 1.3608e+01 1.5 1.40e+08 1.5 0.0e+00 0.0e+00 0.0e+00 2 18 0 0 0 11 18 0 0 0 2976 MatLUFactorNum 1 1.0 1.9609e+01 6.5 2.71e+08 9.3 0.0e+00 0.0e+00 0.0e+00 1 14 0 0 0 10 14 0 0 0 1635 MatILUFactorSym 1 1.0 5.3393e+00 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 1 0 0 0 1 4 0 0 0 2 0 MatGetRowIJ 1 1.0 1.7881e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 2.4659e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 3 0 0 0 0 3 0 VecDot 38 1.0 1.2653e+01 1.9 4.24e+07 2.2 0.0e+00 0.0e+00 3.8e+01 1 4 0 0 49 10 4 0 0 62 710 VecNorm 20 1.0 3.5348e+01 4.0 2.05e+07 6.0 0.0e+00 0.0e+00 2.0e+01 4 2 0 0 26 25 2 0 0 33 134 VecCopy 4 1.0 2.7451e-01 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 79 1.0 8.9448e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 VecAXPY 57 1.0 3.2704e+00 2.2 2.33e+08 1.5 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 2 6 0 0 0 4118 VecAYPX 36 1.0 1.5667e+00 1.7 2.28e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 1 4 0 0 0 5430 VecScatterBegin 38 1.0 1.1066e+00 2.1 0.00e+00 0.0 3.8e+04 5.5e+04 0.0e+00 0 0 97 98 0 1 0100100 0 0 VecScatterEnd 38 1.0 1.5381e+01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 12 0 0 0 0 0 KSPSetup 2 1.0 8.2719e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 KSPSolve 20 1.0 1.2881e+01 1.5 1.42e+08 1.5 0.0e+00 0.0e+00 0.0e+00 1 18 0 0 0 10 18 0 0 0 3144 PCSetUp 2 1.0 2.5356e+01 5.5 1.96e+08 8.7 0.0e+00 0.0e+00 3.0e+00 2 14 0 0 4 13 14 0 0 5 1265 PCApply 40 1.0 4.7115e+01 2.1 1.59e+08 2.4 0.0e+00 0.0e+00 3.0e+00 5 49 0 0 4 35 49 0 0 5 2400
