> On Feb 21, 2019, at 8:28 PM, Mills, Richard Tran <rtmi...@anl.gov> wrote: > > On 2/21/19 1:48 PM, Smith, Barry F. wrote: >> >>> On Feb 20, 2019, at 12:50 PM, Mills, Richard Tran via petsc-dev >>> <petsc-dev@mcs.anl.gov> >>> wrote: >>> >>> Folks, >>> >>> I'm working with some PFLOTRAN examples, and I've been scratching my head >>> for a while at errors that are being generated in >>> MatSetValuesBlockedLocal(), related to new nonzeros requiring a malloc. >>> E.g., >>> >>> [0]PETSC ERROR: Argument out of range >>> [0]PETSC ERROR: New nonzero at (2,0) caused a malloc >>> >>> The strange thing is that all of these examples run fine when I use the >>> PFLOTRAN default BAIJ matrix type (with blocks column-oriented) -- I only >>> see the issue when I switch to using AIJ. If I set the matrix option >>> MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE to allow things to run with >>> AIJ, everything seems to work perfectly. (And once the Jacobian matrix is >>> constructed the first time, the memory usage is not growing according to >>> what I see from -malloc_log.) >>> >>> I've spent some time poking around with a debugger in the variants of >>> MatSetValues() and MatXAIJSetPreallocation(), and I haven't been able make >>> much progress in determining why I'm seeing this in AIJ vs. BAIJ -- it's >>> easy to get confused looking at this code, and it's going to take me more >>> time to follow what is going on than I've had to spare so far. So, I've got >>> a few questions. First, should it even be possible to see what I am seeing? >>> That is, if the MatSetValuesBlocked() routine is not causing new >>> allocations when using BAIJ, should this be possible with AIJ? >>> >> >> Sure, because the preallocation for BAIJ matrix is by block, while for >> AIJ it is by point (even if you provide a block size to AIJ). So long as the >> new entry is within an allocated block it will not trigger an error with >> BAIJ but may with AIJ. >> > Hmm. It seems that I've misunderstood what MatXAIJSetPreallocation() is > supposed to do. I thought that if I pass in a blocksize > 1, > MatXAIJSetPreallocation() will turn the block-row preallocation into the > equivalent scalar-row stuff. This seems to be what is happening in the code, > unless I'm reading this wrong: > > [...] > 304: } else { /* Convert block-row precallocation to > scalar-row */ > 305: PetscInt i,m,*sdnnz,*sonnz; > 306: MatGetLocalSize(A,&m,NULL); > 307: PetscMalloc2((!!dnnz)*m,&sdnnz,(!!onnz)*m,&sonnz); > 308: for (i=0; i<m; i++) { > 309: if (dnnz) sdnnz[i] = dnnz[i/bs] * bs; > 310: if (onnz) sonnz[i] = onnz[i/bs] * bs; > 311: } > 312: MatSeqAIJSetPreallocation(A,0,dnnz ? sdnnz : NULL); > 313: MatMPIAIJSetPreallocation(A,0,dnnz ? sdnnz : NULL,0,onnz ? sonnz : > NULL); > [...] I think your understanding above is correct. > > Though might something be happening later that reduces the amount of nonzeros > allocated, since the "dense" blocks of the matrix actually will have zero > entries that never get a value set? > > Something does appear to be screwy when I look at the "-snes_view" output for > a time-dependent problem where I see these errors. When using BAIJ, at every > time step I see > > Mat Object: Mat_0x84000000_0 (flow_) 1 MPI processes > type: seqbaij > rows=144, cols=144, bs=3 > total: nonzeros=2304, allocated nonzeros=2304 > total number of mallocs used during MatSetValues calls =0 > block size is 3 > > but when I switch to AIJ, I initially see > > Mat Object: Mat_0x84000000_0 (flow_) 1 MPI processes > type: seqaij > rows=144, cols=144, bs=3 > total: nonzeros=1584, allocated nonzeros=3324 > total number of mallocs used during MatSetValues calls =68 > using I-node routines: found 96 nodes, limit used is 5 > > but as things progress, the allocated nonzeros and mallocs count keeps > growing; by step 43 I'm seeing > > total: nonzeros=1584, allocated nonzeros=115524 > total number of mallocs used during MatSetValues calls =7548 > > If I dump the matrices and look at them in something like Octave, the > matrices from the BAIJ and AIJ runs appear identical within round-off error, > and we certainly aren't adding any entries at each step. Why should PETSc > keep calling malloc() at each MatSetValues() call, then? I suspect a bug > somewhere in the AIJ case call sequence of MatSetValuesBlockedLocal() --> > MatSetValuesBlocked() --> MatSetValues() --> MatSetValues_SeqAIJ(). I have > trouble following all of the logic in matsetvaluesseqaij_(), though, so I > guess I'm going to have try harder to understand why > MatSeqXAIJReallocateAIJ() is being hit there. > > Any educated guesses about what might be going wrong would be appreciated. > There is a lot of PETSc code I'm not familiar with involved here. Suggest a breakpoint in the MatSetValues_SeqAIJ location where the malloc() is done to determine why the mallocs keep getting called. Barry > > Cheers, > Richard > > >> >> Barry >> >> >>> (Clearly it *is* possible, but should it be?) I'm still trying to figure >>> out if there is something wrong with what PFLOTRAN is doing, vs. something >>> going wrong somewhere inside PETSc. >>> >>> Any hints about what to look at from someone with more familiarity with >>> this code would be appreciated. >>> >>> --Richard >>> >
Re: [petsc-dev] Errors in MatSetValuesBlockedLocal() when using AIJ, but not BAIJ!
Smith, Barry F. via petsc-dev Thu, 21 Feb 2019 21:42:26 -0800
- [petsc-dev] Errors in MatSetValuesBlocke... Mills, Richard Tran via petsc-dev
- Re: [petsc-dev] Errors in MatSetVal... Smith, Barry F. via petsc-dev
- Re: [petsc-dev] Errors in MatSe... Mills, Richard Tran via petsc-dev
- Re: [petsc-dev] Errors in M... Jed Brown via petsc-dev
- Re: [petsc-dev] Errors in M... Smith, Barry F. via petsc-dev