The error messages may have nothing to do with PETSc and MOOSE. It might be from a package for MPI communication https://github.com/openucx/ucx. I have no experiences on such things. It may be helpful to contact your HPC administer.
Thanks, Fande, On Tue, Oct 2, 2018 at 9:24 AM Matthew Knepley <[email protected]> wrote: > On Tue, Oct 2, 2018 at 11:16 AM Y. Yang < > [email protected]> wrote: > >> Dear PETSc team >> >> Recently I'm using MOOSE (http://www.mooseframework.org/) which is built >> with PETSc and, Unfortunately, I encountered some problems with >> following PETSc options: >> > > I do not know what problem you are reporting.I don't know what package > knem_ep.c is > part of, but its not PETSc. > > Thanks, > > Matt > > >> petsc_options_iname = '-pc_type -ksp_gmres_restart -sub_ksp_type >> -sub_pc_type -pc_asm_overlap -pc_factor_mat_solver_package' >> >> petsc_options_value = 'asm 1201 preonly ilu >> 4 superlu_dist' >> >> >> the error message is: >> >> Time Step 1, time = 1 >> dt = 1 >> >> |residual|_2 of individual variables: >> c: 779.034 >> w: 0 >> T: 6.57948e+07 >> gr0: 211.617 >> gr1: 206.973 >> gr2: 209.382 >> gr3: 191.089 >> gr4: 185.242 >> gr5: 157.361 >> gr6: 128.473 >> gr7: 87.6029 >> >> 0 Nonlinear |R| = [32m6.579482e+07 [39m >> [1538482623.976180] [hpb0085:22501:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482605.111342] [hpb0085:22502:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482606.761138] [hpb0085:22502:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482607.107478] [hpb0085:22502:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482605.882817] [hpb0085:22503:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482607.133543] [hpb0085:22503:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482621.905475] [hpb0085:22510:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482626.531234] [hpb0085:22510:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482627.613343] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482627.830489] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482629.852351] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482630.194620] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482630.280636] [hpb0085:22515:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482600.219314] [hpb0085:22516:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482658.960350] [hpb0085:22516:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482622.949471] [hpb0085:22517:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482612.502017] [hpb0085:22500:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482613.231970] [hpb0085:22500:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482621.417530] [hpb0085:22520:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482622.020998] [hpb0085:22520:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482606.221292] [hpb0085:22521:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482606.676987] [hpb0085:22521:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482606.896865] [hpb0085:22521:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482639.611427] [hpb0085:22522:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482631.435277] [hpb0085:22523:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482658.278343] [hpb0085:22512:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482658.396945] [hpb0085:22512:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482659.917476] [hpb0085:22512:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> [1538482660.162064] [hpb0085:22512:0] knem_ep.c:84 UCX ERROR >> KNEM inline copy failed, err = -1 Invalid argument >> 2 total processes killed (some possibly by mpirun during cleanup) >> >> >> Here's the status of the simulation >> >> Parallelism: >> Num Processors: 100 >> Num Threads: 1 >> >> Mesh: >> Parallel Type: distributed >> Mesh Dimension: 3 >> Spatial Dimension: 3 >> Nodes: >> Total: 2065551 >> Local: 22774 >> Elems: >> Total: 2000000 >> Local: 20006 >> Num Subdomains: 1 >> Num Partitions: 100 >> Partitioner: parmetis >> >> Nonlinear System: >> Num DOFs: 18589959 >> Num Local DOFs: 204966 >> Variables: { "c" "w" "T" "gr0" "gr1" "gr2" "gr3" "gr4" >> "gr5" } >> Finite Element Types: "LAGRANGE" >> Approximation Orders: "FIRST" >> >> Auxiliary System: >> Num DOFs: 10065551 >> Num Local DOFs: 102798 >> Variables: "bnds" { "var_indices" "unique_grains" } { >> "M" "dM/dT" } >> Finite Element Types: "LAGRANGE" "MONOMIAL" "MONOMIAL" >> Approximation Orders: "FIRST" "CONSTANT" "CONSTANT" >> >> Relationship Managers: >> Geometric : GrainTrackerHaloRM (2 layers) >> >> Execution Information: >> Executioner: Transient >> TimeStepper: IterationAdaptiveDT >> Solver Mode: Preconditioned JFNK >> >> >> I tried modifying the parameters and other preconditioning option, the >> problem is much the same. So I don't know where I did wrong or there is >> actually suitable PETSc option to deal with such problem with large >> mesh. I would like to hear your response. >> >> Sincerely, >> Yang >> >> -- >> ______________________________________________________ >> >> Yangyiwei Yang >> Wissenschaftliche Hilfskraft >> >> TU Darmstadt >> Fachbereich 11 - Material- und Geowissenschaften >> Fachgebiet Mechanik funktionaler Materialien >> >> L1 | 08 402 >> Otto Berndt Straße 3 >> D-64287 Darmstadt >> >> Tel: +49 (0)6151-16-22923 >> Email: [email protected] >> Homepage: http://www.mawi.tu-darmstadt.de/mfm >> ORCID: 0000-0001-5505-7117 >> >> ______________________________________________________ >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> >
