Re: [petsc-dev] Broken MatMatMult_MPIAIJ_MPIDense

Pierre Jolivet via petsc-dev Sun, 22 Sep 2019 21:51:36 -0700

> On 23 Sep 2019, at 4:16 AM, h...@aspiritech.org wrote:
> 
> Now I understand why the latest changes in MatMatMult_MPIAIJ_MPIDense() cause 
> errors in your application (likely for slepc as well). 
> I always assume "MAT_REUSE_MATRIX requires that the C matrix has come from a 
> previous call with MAT_INITIAL_MATRIX" for all matrix products in petsc. I 
> noticed 
> 
> /*
>     This is a "dummy function" that handles the case where matrix C was 
> created as a dense matrix
>   directly by the user and passed to MatMatMult() with the MAT_REUSE_MATRIX 
> option
> 
>   It is the same as MatMatMultSymbolic_MPIAIJ_MPIDense() except does not 
> create C
> */
> PetscErrorCode MatMatMultNumeric_MPIDense(Mat A,Mat B,Mat C)
> 
> but I do not see this routine is being used by any petsc routines or tests, 
> thus I thought it might be a dead routine,


It sure isn’t: 
https://www.mcs.anl.gov/petsc/petsc-current/src/mat/impls/dense/mpi/mpidense.c.html#line1192
 
<https://www.mcs.anl.gov/petsc/petsc-current/src/mat/impls/dense/mpi/mpidense.c.html#line1192>
> and planned to remove it after a careful investigation.
> The name of this routine is misleading as well.

I agree.

> I prefer a uniformed and clean design for all matrix products, i.e., do not 
> allow "REUSE" without previous call with "INITIAL". Almost all products 
> include internal data structures for various reasons. 
> In latest MatMatMult_MPIAIJ_MPIDense(), we added internal data structures and 
> obtained an impressive improvement in memory usage.

As I said to Barry, the change in behavior will likely result in worse memory 
usage for all applications that currently use this “feature”.
The previous symbolic phase was also basically a no-op, I’ve not yet 
benchmarked the new implementation, but I’m betting this will be slightly 
costlier.
Codes which previously didn’t cache the Mat C and relied once again on this 
feature will also have to pay the prize of doing multiple symbolic phases 
(assuming the rest remains as is and is not changed to cache C).

Thanks,
Pierre

> Hong
> 
> On Sun, Sep 22, 2019 at 4:38 PM Pierre Jolivet via petsc-dev 
> <petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>> wrote:
> 
> 
>> On 22 Sep 2019, at 8:32 PM, Smith, Barry F. <bsm...@mcs.anl.gov 
>> <mailto:bsm...@mcs.anl.gov>> wrote:
>> 
>> 
>>  Since this a common used feature we will need to support it in the release 
>> or it will break a variety of codes.
>> 
>>  I am not sure how to "deprecate it" in a useful way. How would the code 
>> actively tell the user that the approach is deprecated and they should 
>> update their code before the next release? Having it print warnings while it 
>> is running if they never used the INITIAL is too intrusive but what else 
>> could be done? Save the message and print it when the program ends? I guess 
>> we could do that. Is that too intrusive? Will it break other peoples tests? 
>> Do we want it to break other people's tests with this message?
>> 
>>  Suggestions?  For sure this feature will be removed at some point, how to 
>> give users useful warning (reading a document doesn't work).
> 
> I believe that if you deprecate this behavior, it should mean that you 
> deprecate MatMatMultNumeric_MPIDense as well.
> Which means that there are tests that are becoming pretty meaningless, e.g., 
> https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line606
>  
> <https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line606>
>  unless there is a memory corruption of some sort, but then the user has 
> bigger concerns.
> They could instead be replaced by a check for the presence of the correct 
> stuff (MPI_Datatype and whatnot) introduced by the new MR, and if not, do 
> what you suggest (error message at the end, PetscInfo, whatever…) + go back 
> to the symbolic phase to avoid any kind of segfault like I’m getting right 
> now?
> Just my two cents…
> 
> Thanks,
> Pierre
> 
>>  Barry
>> 
>> 
>> 
>> 
>>> On Sep 22, 2019, at 1:16 PM, Pierre Jolivet <pierre.joli...@enseeiht.fr 
>>> <mailto:pierre.joli...@enseeiht.fr>> wrote:
>>> 
>>> Just to be sure: can we expect this "feature" to be fixed for the upcoming 
>>> release and deprecated later on, or will you get rid of this for good for 
>>> the release?
>>> 
>>> Thanks,
>>> Pierre
>>> 
>>> On Sep 22, 2019, at 7:11 PM, "Smith, Barry F." <bsm...@mcs.anl.gov 
>>> <mailto:bsm...@mcs.anl.gov>> wrote:
>>> 
>>>> 
>>>> Jose,
>>>> 
>>>>   Thanks for the pointer. 
>>>> 
>>>>   Will this change dramatically affect the organization of SLEPc? As noted 
>>>> in my previous email eventually we need to switch to a new API where the 
>>>> REUSE with a different matrix is even more problematic.
>>>> 
>>>>    If you folks have use cases that fundamentally require reusing a 
>>>> previous matrix instead of destroying and getting a new one created we 
>>>> will need to think about additional features in the API that would allow 
>>>> this reusing of an array. But it seems to me that destroying the old 
>>>> matrix and using the initial call to create the matrix should be ok and 
>>>> just require relatively minor changes to your codes?
>>>> 
>>>> Barry
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Sep 22, 2019, at 11:55 AM, Jose E. Roman <jro...@dsic.upv.es 
>>>>> <mailto:jro...@dsic.upv.es>> wrote:
>>>>> 
>>>>> The man page of MatMatMult says:
>>>>> "In the special case where matrix B (and hence C) are dense you can 
>>>>> create the correctly sized matrix C yourself and then call this routine 
>>>>> with MAT_REUSE_MATRIX, rather than first having MatMatMult() create it 
>>>>> for you."
>>>>> 
>>>>> If you are going to change the usage, don't forget to remove this 
>>>>> sentence. This use case is what we use in SLEPc and is now causing 
>>>>> trouble.
>>>>> Jose
>>>>> 
>>>>> 
>>>>> 
>>>>>> El 22 sept 2019, a las 18:49, Pierre Jolivet via petsc-dev 
>>>>>> <petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>> escribió:
>>>>>> 
>>>>>> 
>>>>>>> On 22 Sep 2019, at 6:33 PM, Smith, Barry F. <bsm...@mcs.anl.gov 
>>>>>>> <mailto:bsm...@mcs.anl.gov>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Ok. So we definitely need better error checking and to clean up the 
>>>>>>> code, comments and docs 
>>>>>>> 
>>>>>>> As the approaches for these computations of products get more 
>>>>>>> complicated it becomes a bit harder to support the use of a raw product 
>>>>>>> matrix so I don't think we want to add all the code needed to call the 
>>>>>>> symbolic part (after the fact) when the matrix is raw.
>>>>>> 
>>>>>> To the best of my knowledge, there is only a single method (not taking 
>>>>>> MR 2069 into account) that uses a MPIDense B and for which these 
>>>>>> approaches are necessary, so it’s not like there is a hundred of code 
>>>>>> paths to fix, but I understand your point.
>>>>>> 
>>>>>>> Would that make things terribly difficult for you not being able to use 
>>>>>>> a raw matrix?
>>>>>> 
>>>>>> Definitely not, but that would require some more memory + one copy after 
>>>>>> the MatMatMult (depending on the size of your block Krylov space, that 
>>>>>> can be quite large, and that defeats the purpose of MR 2032 of being 
>>>>>> more memory efficient).
>>>>>> (BTW, I now remember that I’ve been using this “feature” since our SC16 
>>>>>> paper on block Krylov methods)
>>>>>> 
>>>>>>> I suspect that the dense case was just lucky that using a raw matrix 
>>>>>>> worked.
>>>>>> 
>>>>>> I don’t think so, this is clearly the intent of 
>>>>>> MatMatMultNumeric_MPIDense (vs. MatMatMultNumeric_MPIAIJ_MPIDense).
>>>>>> 
>>>>>>> The removal of the de facto support for REUSE on the raw matrix should 
>>>>>>> be added to the changes document.
>>>>>>> 
>>>>>>> Sorry for the difficulties. We have trouble testing all the 
>>>>>>> combinations of possible usage, even a coverage tool would not have 
>>>>>>> indicated a problems the lack of lda support.
>>>>>> 
>>>>>> No problem!
>>>>>> 
>>>>>> Thank you,
>>>>>> Pierre
>>>>>> 
>>>>>>> Hong,
>>>>>>> 
>>>>>>> Can you take a look at these things on Monday and maybe get a clean 
>>>>>>> into a MR so it gets into the release?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> 
>>>>>>> Barry
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Sep 22, 2019, at 11:12 AM, Pierre Jolivet 
>>>>>>>> <pierre.joli...@enseeiht.fr <mailto:pierre.joli...@enseeiht.fr>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 22 Sep 2019, at 6:03 PM, Smith, Barry F. <bsm...@mcs.anl.gov 
>>>>>>>>> <mailto:bsm...@mcs.anl.gov>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Sep 22, 2019, at 10:14 AM, Pierre Jolivet via petsc-dev 
>>>>>>>>>> <petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>> wrote:
>>>>>>>>>> 
>>>>>>>>>> FWIW, I’ve fixed MatMatMult and MatTransposeMatMult here 
>>>>>>>>>> https://gitlab.com/petsc/petsc/commit/93d7d1d6d29b0d66b5629a261178b832a925de80
>>>>>>>>>>  
>>>>>>>>>> <https://gitlab.com/petsc/petsc/commit/93d7d1d6d29b0d66b5629a261178b832a925de80>
>>>>>>>>>>  (with MAT_INITIAL_MATRIX).
>>>>>>>>>> I believe there is something not right in your MR (2032) with 
>>>>>>>>>> MAT_REUSE_MATRIX (without having called MAT_INITIAL_MATRIX first), 
>>>>>>>>>> cf. 
>>>>>>>>>> https://gitlab.com/petsc/petsc/merge_requests/2069#note_220269898 
>>>>>>>>>> <https://gitlab.com/petsc/petsc/merge_requests/2069#note_220269898>.
>>>>>>>>>> Of course, I’d love to be proved wrong!
>>>>>>>>> 
>>>>>>>>> I don't understand the context.
>>>>>>>>> 
>>>>>>>>> MAT_REUSE_MATRIX requires that the C matrix has come from a previous 
>>>>>>>>> call with MAT_INITIAL_MATRIX, you cannot just put any matrix in the C 
>>>>>>>>> location.
>>>>>>>> 
>>>>>>>> 1) It was not the case before the MR, I’ve used that “feature” (which 
>>>>>>>> may be specific for MatMatMult_MPIAIJ_MPIDense) for as long as I can 
>>>>>>>> remember
>>>>>>>> 2) If it is not the case anymore, I think it should be mentioned 
>>>>>>>> somewhere (and not only in the git log, because I don’t think all 
>>>>>>>> users will go through that)
>>>>>>>> 3) This comment should be removed from the code as well: 
>>>>>>>> https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line398
>>>>>>>>  
>>>>>>>> <https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line398>
>>>>>>>> 
>>>>>>>>> This is documented in the manual page. We should have better error 
>>>>>>>>> checking that this is the case so the code doesn't crash at memory 
>>>>>>>>> access but instead produces a very useful error message if the matrix 
>>>>>>>>> was not obtained with MAT_INITIAL_MATRIX. 
>>>>>>>>> 
>>>>>>>>> Is this the issue or do I not understand?
>>>>>>>> 
>>>>>>>> This is exactly the issue.
>>>>>>>> 
>>>>>>>>> Barry
>>>>>>>>> 
>>>>>>>>> BTW: yes MAT_REUSE_MATRIX has different meanings for different matrix 
>>>>>>>>> operations in terms of where the matrix came from, this is suppose to 
>>>>>>>>> be all documented in each methods manual page but some may be missing 
>>>>>>>>> or incomplete, and error checking is probably not complete for all 
>>>>>>>>> cases.  Perhaps the code should be changed to have multiple different 
>>>>>>>>> names for each reuse case for clarity to user?
>>>>>>>> 
>>>>>>>> Definitely, cf. above.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Pierre
>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Pierre
>>>>>>>>>> 
>>>>>>>>>>> On 22 Sep 2019, at 5:04 PM, Zhang, Hong <hzh...@mcs.anl.gov 
>>>>>>>>>>> <mailto:hzh...@mcs.anl.gov>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I'll check it tomorrow.
>>>>>>>>>>> Hong
>>>>>>>>>>> 
>>>>>>>>>>> On Sun, Sep 22, 2019 at 1:04 AM Pierre Jolivet via petsc-dev 
>>>>>>>>>>> <petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>> wrote:
>>>>>>>>>>> Jed,
>>>>>>>>>>> I’m not sure how easy it is to put more than a few lines of code on 
>>>>>>>>>>> GitLab, so I’ll just send the (tiny) source here, as a follow-up of 
>>>>>>>>>>> our discussion 
>>>>>>>>>>> https://gitlab.com/petsc/petsc/merge_requests/2069#note_220229648 
>>>>>>>>>>> <https://gitlab.com/petsc/petsc/merge_requests/2069#note_220229648>.
>>>>>>>>>>> Please find attached a .cpp showing the brokenness of C=A*B with A 
>>>>>>>>>>> of type MPIAIJ and B of type MPIDense when the LDA of B is not 
>>>>>>>>>>> equal to its number of local rows.
>>>>>>>>>>> It does [[1,1];[1,1]] * [[0,1,2,3];[0,1,2,3]]
>>>>>>>>>>> C should be equal to 2*B, but it’s not, unless lda = m (= 1).
>>>>>>>>>>> Mat Object: 2 MPI processes
>>>>>>>>>>> type: mpidense
>>>>>>>>>>> 0.0000000000000000e+00 1.0000000000000000e+00 
>>>>>>>>>>> 2.0000000000000000e+00 3.0000000000000000e+00
>>>>>>>>>>> 0.0000000000000000e+00 1.0000000000000000e+00 
>>>>>>>>>>> 2.0000000000000000e+00 3.0000000000000000e+00
>>>>>>>>>>> 
>>>>>>>>>>> If you change Bm here 
>>>>>>>>>>> https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line549
>>>>>>>>>>>  
>>>>>>>>>>> <https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line549>
>>>>>>>>>>>  to the LDA of B, you’ll get the correct result.
>>>>>>>>>>> Mat Object: 2 MPI processes
>>>>>>>>>>> type: mpidense
>>>>>>>>>>> 0.0000000000000000e+00 2.0000000000000000e+00 
>>>>>>>>>>> 4.0000000000000000e+00 6.0000000000000000e+00
>>>>>>>>>>> 0.0000000000000000e+00 2.0000000000000000e+00 
>>>>>>>>>>> 4.0000000000000000e+00 6.0000000000000000e+00
>>>>>>>>>>> 
>>>>>>>>>>> Unfortunately, w.r.t. MR 2069, I still don’t get the same results 
>>>>>>>>>>> with a plain view LDA > m (KO) and a view + duplicate LDA = m (OK).
>>>>>>>>>>> So there might be something else to fix (or this might not even be 
>>>>>>>>>>> a correct fix), but the only reproducer I have right now is the 
>>>>>>>>>>> full solver.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Pierre
>>>> 
>> 
>

Re: [petsc-dev] Broken MatMatMult_MPIAIJ_MPIDense

Reply via email to