On Thu, Feb 17, 2022 at 11:46 AM Bojan Niceno < bojan.niceno.scient...@gmail.com> wrote:
> Dear all, > > > I am experiencing difficulties when using PETSc in parallel in an > unstructured CFD code. It uses CRS format to store its matrices. I use > the following sequence of PETSc call in the hope to get PETSc solving my > linear systems in parallel. Before I continue, I would just like to say > that the code is MPI parallel since long time ago, and performs its own > domain decomposition through METIS, and it works out its communication > patterns which work with its home-grown (non-PETSc) linear solvers. > Anyhow, I issue the following calls: > > err = PetscInitialize(0, NULL, (char*)0, help); > > err = MatCreate(MPI_COMM_WORLD, A); > In the above, I use MPI_COMM_WORLD instead of PETSC_COMM_SELF because the > call to MPI_Init is invoked outside of PETSc, from the main program. > > err = MatSetSizes(A, m, m, M, M); > Since my matrices are exclusively square, M is set to the total number of > computational cells, while m is equal to the number of computational cells > within each subdomain/processor. (Not all processors necessarily have the > same m, it depends on domain decomposition.) I do not distinguish between > m (M) and n (N) since matrices are all square. Am I wrong to assume that? > > err = MatSetType(A, MATAIJ); > I set the matrix to be of type MATAIJ, to cover runs on one and on more > processors. By the way, on one processors everything works fine > > err = MatMPIAIJSetPreallocation(A, 0, d_nnz, 0, o_nnz); > err = MatSeqAIJSetPreallocation(A, 0, d_nnz); > The two lines above specify matrix preallocation. Since d_nz and o_nz > vary from cell to cell (row to row), I set them to zero and provide arrays > with number of diagonal and off diagonal zeroes instead. To my > understanding, that is legit since d_nz and o_nz are neglected if d_nnz and > o_nnz are provided. Am I wrong? > > Finally, inside a loop through rows and columns I call: > > err = MatSetValue(A, row, col, value, INSERT_VALUES); > Here I make sure that row and col point to global cell (unknown) numbers. > > Yet, when I run the code on more than one processor, I get the error: > > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [3]PETSC ERROR: Argument out of range > [3]PETSC ERROR: New nonzero at (21,356) caused a malloc > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn > off this check > > [3]PETSC ERROR: #1 MatSetValues_MPIAIJ() at > /home/niceno/Development/petsc-debug/src/mat/impls/aij/mpi/mpiaij.c:517 > [3]PETSC ERROR: #2 MatSetValues() at > /home/niceno/Development/petsc-debug/src/mat/interface/matrix.c:1398 > [3]PETSC ERROR: #3 MatSetValues_MPIAIJ() at > /home/niceno/Development/petsc-debug/src/mat/impls/aij/mpi/mpiaij.c:517 > [3]PETSC ERROR: #4 MatSetValues() at > /home/niceno/Development/petsc-debug/src/mat/interface/matrix.c:1398 > > and so forth, for roughly 10% of all matrix entries. I checked if these > errors occur only for off-diagonal parts of the matrix entries, but that is > not the case. > > Error code is 63; PETSC_ERR_ARG_OUTOFRANGE > > Does anyone have an idea what am I doing wrong? Is any of my assumptions > above (like thinking n(N) is always m(M) for square matrices, that I can > send zeros as d_nz and o_nz if I provide arrays d_nnz[] and o_nnz[] wrong? > That is correct. > Any idea how to debug it, where to look for an error? > > I would guess that you are counting your o_nnz incorrectly. It looks like a small number of equations per process because the 4th process has row 21, apparently. Does that sound right? And column 356 is going to be in the off-diagonal block (ie, "o"). I would start with a serial matrix and run with -info. This will be noisy but you will see things like"number of unneeded..." that you can verify that you have set d_nnz perfectly (there should be 0 unneeded). Then try two processors. If it fails you could a print statement in everytime the row (eg, 21) is added to and check what your code for computing o_nnz is doing. I am carefully checking all the data I send to PETSc functions and looks > correct to me, but maybe I lack some fundamental understanding of what > should be provided to PETSc, as I write above? > It is a bit confusing at first. The man page gives a concrete example https://petsc.org/main/docs/manualpages/Mat/MatCreateAIJ.html#MatCreateAIJ > > > Best regards, > > > Bojan Niceno > >