> On Feb 21, 2019, at 8:28 PM, Mills, Richard Tran <rtmi...@anl.gov> wrote:
> 
> On 2/21/19 1:48 PM, Smith, Barry F. wrote:
>> 
>>> On Feb 20, 2019, at 12:50 PM, Mills, Richard Tran via petsc-dev 
>>> <petsc-dev@mcs.anl.gov>
>>>  wrote:
>>> 
>>> Folks,
>>> 
>>> I'm working with some PFLOTRAN examples, and I've been scratching my head 
>>> for a while at errors that are being generated in 
>>> MatSetValuesBlockedLocal(), related to new nonzeros requiring a malloc. 
>>> E.g.,
>>> 
>>> [0]PETSC ERROR: Argument out of range
>>> [0]PETSC ERROR: New nonzero at (2,0) caused a malloc
>>> 
>>> The strange thing is that all of these examples run fine when I use the 
>>> PFLOTRAN default BAIJ matrix type (with blocks column-oriented) -- I only 
>>> see the issue when I switch to using AIJ. If I set the matrix option 
>>> MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE to allow things to run with 
>>> AIJ, everything seems to work perfectly. (And once the Jacobian matrix is 
>>> constructed the first time, the memory usage is not growing according to 
>>> what I see from -malloc_log.)
>>> 
>>> I've spent some time poking around with a debugger in the variants of 
>>> MatSetValues() and MatXAIJSetPreallocation(), and I haven't been able make 
>>> much progress in determining why I'm seeing this in AIJ vs. BAIJ -- it's 
>>> easy to get confused looking at this code, and it's going to take me more 
>>> time to follow what is going on than I've had to spare so far. So, I've got 
>>> a few questions. First, should it even be possible to see what I am seeing? 
>>> That is, if the MatSetValuesBlocked() routine is not causing new 
>>> allocations when using BAIJ, should this be possible with AIJ?
>>> 
>> 
>>     Sure, because the preallocation for BAIJ matrix is by block, while for 
>> AIJ it is by point (even if you provide a block size to AIJ). So long as the 
>> new entry is within an allocated block it will not trigger an error with 
>> BAIJ but may with AIJ.
>> 
> Hmm. It seems that I've misunderstood what MatXAIJSetPreallocation() is 
> supposed to do. I thought that if I pass in a blocksize > 1, 
> MatXAIJSetPreallocation() will turn the block-row preallocation into the 
> equivalent scalar-row stuff. This seems to be what is happening in the code, 
> unless I'm reading this wrong:
> 
> [...]
> 304:     } else {                    /* Convert block-row precallocation to 
> scalar-row */
> 305:       PetscInt i,m,*sdnnz,*sonnz;
> 306:       MatGetLocalSize(A,&m,NULL);
> 307:       PetscMalloc2((!!dnnz)*m,&sdnnz,(!!onnz)*m,&sonnz);
> 308:       for (i=0; i<m; i++) {
> 309:         if (dnnz) sdnnz[i] = dnnz[i/bs] * bs;
> 310:         if (onnz) sonnz[i] = onnz[i/bs] * bs;
> 311:       }
> 312:       MatSeqAIJSetPreallocation(A,0,dnnz ? sdnnz : NULL);
> 313:       MatMPIAIJSetPreallocation(A,0,dnnz ? sdnnz : NULL,0,onnz ? sonnz : 
> NULL);
> [...]

  I think your understanding above is correct.

> 
> Though might something be happening later that reduces the amount of nonzeros 
> allocated, since the "dense" blocks of the matrix actually will have zero 
> entries that never get a value set?
> 
> Something does appear to be screwy when I look at the "-snes_view" output for 
> a time-dependent problem where I see these errors. When using BAIJ, at every 
> time step I see
> 
>     Mat Object: Mat_0x84000000_0 (flow_) 1 MPI processes
>       type: seqbaij
>       rows=144, cols=144, bs=3
>       total: nonzeros=2304, allocated nonzeros=2304
>       total number of mallocs used during MatSetValues calls =0
>           block size is 3
> 
> but when I switch to AIJ, I initially see
> 
>     Mat Object: Mat_0x84000000_0 (flow_) 1 MPI processes
>       type: seqaij
>       rows=144, cols=144, bs=3
>       total: nonzeros=1584, allocated nonzeros=3324
>       total number of mallocs used during MatSetValues calls =68
>         using I-node routines: found 96 nodes, limit used is 5
> 
> but as things progress, the allocated nonzeros and mallocs count keeps 
> growing; by step 43 I'm seeing
> 
>       total: nonzeros=1584, allocated nonzeros=115524
>       total number of mallocs used during MatSetValues calls =7548
> 
> If I dump the matrices and look at them in something like Octave, the 
> matrices from the BAIJ and AIJ runs appear identical within round-off error, 
> and we certainly aren't adding any entries at each step. Why should PETSc 
> keep calling malloc() at each MatSetValues() call, then? I suspect a bug 
> somewhere in the AIJ case call sequence of MatSetValuesBlockedLocal() --> 
> MatSetValuesBlocked() --> MatSetValues() --> MatSetValues_SeqAIJ(). I have 
> trouble following all of the logic in matsetvaluesseqaij_(), though, so I 
> guess I'm going to have try harder to understand why 
> MatSeqXAIJReallocateAIJ() is being hit there.
> 
> Any educated guesses about what might be going wrong would be appreciated. 
> There is a lot of PETSc code I'm not familiar with involved here.

   Suggest a breakpoint in the MatSetValues_SeqAIJ location where the malloc() 
is done to determine why the mallocs keep getting called.

   Barry

> 
> Cheers,
> Richard
> 
> 
>> 
>>    Barry
>> 
>> 
>>> (Clearly it *is* possible, but should it be?) I'm still trying to figure 
>>> out if there is something wrong with what PFLOTRAN is doing, vs. something 
>>> going wrong somewhere inside PETSc.
>>> 
>>> Any hints about what to look at from someone with more familiarity with 
>>> this code would be appreciated.
>>> 
>>> --Richard
>>> 
> 

Reply via email to