Re: [petsc-dev] Proposed changes to TS API

Stefano Zampini Fri, 11 May 2018 10:58:33 -0700

> On May 11, 2018, at 6:20 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> 
> 
> 
>> On May 11, 2018, at 8:03 AM, Stefano Zampini <stefano.zamp...@gmail.com> 
>> wrote:
>> 
>> I don’t think changing the current TS API is best approach.
>> 
>> Obtaining separate Jacobians is a need for adjoints and tangent linear 
>> models only.
>> This is how I implemented it in stefano_zampini/feature-continuousadjoint
>> https://bitbucket.org/petsc/petsc/src/7203629c61cbeced536ed8ba1dd2ef85ffb89e8f/src/ts/interface/tssplitjac.c#lines-48
>> 
>> Note that, instead of requiring the user to call PetscObjectComposeFunction, 
>> we can use a function pointer and have TSSetComputeSplitJacobians
> 
>   So you are proposing keeping TSSetIFunction and TSSetIJacobian and ADDING a 
> new API TSSetComputeSplitJacobians() and it that is not provided calling 
> TSComputeIJacobian() twice with different shifts (which is definitely not 
> efficient and is what Hong does also).
>


Why is it inefficient? If you need BOTH dF/dUdot and dF/dU, you need two 
different assemblies even if we change the API. Note that the physics is needed 
to evaluate dF/du, but usually it’s not for dF/dUdot.
And the MatAXPY is with SAME_NONZERO_PATTERN, so, basically no time.
The only difference is one extra physics evaluation (that can be expensive). 
However, advanced users that are aware of that, can provide their specialized 
version of ComputeJacobians.


>   Barry
> 
>> 
>> 
>> 
>>> On May 11, 2018, at 3:20 PM, Jed Brown <j...@jedbrown.org> wrote:
>>> 
>>> "Smith, Barry F." <bsm...@mcs.anl.gov> writes:
>>> 
>>>>> On May 10, 2018, at 4:12 PM, Jed Brown <j...@jedbrown.org> wrote:
>>>>> 
>>>>> "Zhang, Hong" <hongzh...@anl.gov> writes:
>>>>> 
>>>>>> Dear PETSc folks,
>>>>>> 
>>>>>> Current TS APIs (IFunction/IJacobian+RHSFunction/RHSJacobian) were 
>>>>>> designed for the fully implicit formulation F(t,U,Udot) = G(t,U).
>>>>>> Shampine's paper 
>>>>>> (https://www.sciencedirect.com/science/article/pii/S0377042706004110?via%3Dihub<https://www.sciencedirect.com/science/article/pii/S0377042706004110?via=ihub>)
>>>>>>  explains some reasoning behind it.
>>>>>> 
>>>>>> Our formulation is general enough to cover all the following common cases
>>>>>> 
>>>>>> *   Udot = G(t,U) (classic ODE)
>>>>>> *   M Udot = G(t,U)  (ODEs/DAEs for mechanical and electronic systems)
>>>>>> *   M(t,U) Udot = G(t,U) (PDEs)
>>>>>> 
>>>>>> Yet the TS APIs provide the capability to solve both simple problems and 
>>>>>> complicated problems. However, we are not doing well to make TS easy to 
>>>>>> use and efficient especially for simple problems. Over the years, we 
>>>>>> have clearly seen the main drawbacks including:
>>>>>> 1. The shift parameter exposed in IJacobian is terribly confusing, 
>>>>>> especially to new users. Also it is not conceptually straightforward 
>>>>>> when using AD or finite differences on IFunction to approximate 
>>>>>> IJacobian.
>>>>> 
>>>>> What isn't straightforward about AD or FD on the IFunction?  That one
>>>>> bit of chain rule?
>>>>> 
>>>>>> 2. It is difficult to switch from fully implicit to fully explicit. 
>>>>>> Users cannot use explicit methods when they provide IFunction/IJacobian.
>>>>> 
>>>>> This is a real issue, but it's extremely common for PDE to have boundary
>>>>> conditions enforced as algebraic constraints, thus yielding a DAE.
>>>>> 
>>>>>> 3. The structure of mass matrix is completely invisible to TS. This 
>>>>>> means giving up all the opportunities to improve efficiency. For 
>>>>>> example, when M is constant or weekly dependent on U, we might not want 
>>>>>> to evaluate/update it every time IJacobian is called. If M is diagonal, 
>>>>>> the Jacobian can be shifted more efficiently than just using MatAXPY().
>>>>> 
>>>>> I don't understand 
>>>>> 
>>>>>> 4. Reshifting the Jacobian is unnecessarily expensive and sometimes 
>>>>>> buggy.
>>>>> 
>>>>> Why is "reshifting" needed?  After a step is rejected and when the step
>>>>> size changes for a linear constant operator?
>>>>> 
>>>>>> Consider the scenario below.
>>>>>> shift = a;
>>>>>> TSComputeIJacobian()
>>>>>> shift = b;
>>>>>> TSComputeIJacobian() // with the same U and Udot as last call
>>>>>> Changing the shift parameter requires the Jacobian function to be 
>>>>>> evaluated again. If users provide only RHSJacobian, the Jacobian will 
>>>>>> not be updated/reshifted in the second call because 
>>>>>> TSComputeRHSJacobian() finds out that U has not been changed. This issue 
>>>>>> is fixable by adding more logic into the already convoluted 
>>>>>> implementation of TSComputeIJacobian(), but the intention here is to 
>>>>>> illustrate the cumbersomeness of current IJacobian and the growing 
>>>>>> complications in TSComputeIJacobian() that IJacobian causes.
>>>>>> 
>>>>>> So I propose that we have two separate matrices dF/dUdot and dF/dU, and 
>>>>>> remove the shift parameter from IJacobian. dF/dU will be calculated by 
>>>>>> IJacobian; dF/dUdot will be calculated by a new callback function and 
>>>>>> default to an identity matrix if it is not provided by users. Then the 
>>>>>> users do not need to assemble the shifted Jacobian since TS will handle 
>>>>>> the shifting whenever needed. And knowing the structure of dF/dUdot (the 
>>>>>> mass matrix), TS will become more flexible. In particular, we will have
>>>>>> 
>>>>>> *   easy access to the unshifted Jacobian dF/dU (this simplifies the 
>>>>>> adjoint implementation a lot),
>>>>> 
>>>>> How does this simplify the adjoint?
>>>>> 
>>>>>> *   plenty of opportunities to optimize TS when the mass matrix is 
>>>>>> diagonal or constant or weekly dependent on U (which accounts for almost 
>>>>>> all use cases in practice),
>>>>> 
>>>>> But longer critical path,
>>>> 
>>>>  What do you mean by longer critical path?
>>> 
>>> Create Mass (dF/dUdot) matrix, call MatAssembly, create dF/dU, call
>>> MatAssembly, call MatAXPY (involves another MatAssembly unless
>>> SAME_NONZERO_PATTERN).  That's a long sequence for what could be one
>>> MatAssembly.  Also note that geometric setup for elements is usually
>>> done in each element loop.  For simple physics, this is way more
>>> expensive than the physics (certainly the case for LibMesh and Deal.II).
>>> 
>>>>> more storage required, and more data motion.
>>>> 
>>>> The extra storage needed is related to the size of the mass matrix 
>>>> correct? And the extra data motion is related to the size of the mass 
>>>> matrix correct?
>>> 
>>> Yes, which is the same as the stiffness matrix for finite element methods.
>>> 
>>>>  Is the extra work coming from a needed call to MatAXPY (to combine the 
>>>> scaled mass matrix with the Jacobian) in Hong's approach? While, in 
>>>> theory, the user can avoid the MatAXPY in the current code if they have 
>>>> custom code that assembles directly the scaled mass matrix and Jacobian? 
>>>> But surely most users would not write such custom code and would 
>>>> themselves keep a copy of the mass matrix (likely constant) and use 
>>>> MatAXPY() to combine the copy of the mass matrix with the Jacobian they 
>>>> compute at each timestep/stage?  Or am I missing something?
>>> 
>>> It's better (and at least as convenient) to write code that assembles
>>> into the mass matrix (at the element scale if you want to amortize mesh
>>> traversal, but can also be a new traversal without needing extra
>>> MatAssembly).  Then instead of MatAXPY(), you call the code that
>>> ADD_VALUES the mass part.  I think storing the mass matrix away
>>> somewhere is a false economy in almost all cases.
>>> 
>>> There is also the issue that matrix-free preconditioning is much more
>>> confusing with the new proposed scheme.  As it is now, the matrix needed
>>> by the solver is specified and the user can choose how to approximate
>>> it.  If only the pieces are specified, then a PCShell will need to
>>> understand the result of a MatAXPY with shell matrices.
>>> 
>>>>> And if the mass matrix is simple, won't it take a very small fraction of
>>>>> time, thus have little gains from "optimizing it"?
>>>> 
>>>>  Is the Function approach only theoretically much more efficient than 
>>>> Hong's approach when the mass matrix is nontrivial? That is the mass 
>>>> matrix has a nonzero structure similar to the Jacobian? 
>>> 
>>> The extra MatAssembly is called even for MatAXPY with
>>> SUBSET_NONZERO_PATTERN.  But apart from strong scaling concerns like
>>> that extra communication (which could be optimized, but there are
>>> several formats to optimize) any system should be sufficiently fast if
>>> the mass matrix is trivial because that means it has much less work than
>>> the dF/dU matrix.
>>> 
>>>>>> *   easy conversion from fully implicit to fully explicit,
>>>>>> *   an IJacobian without shift parameter that is easy to understand and 
>>>>>> easy to port to other software.
>>>>> 
>>>>> Note that even CVODE has an interface similar to PETSc; e.g., gamma
>>>>> parameter in CVSpilsPrecSetupFn.
>>>>> 
>>>>>> Regarding the changes on the user side, most IJacobian users should not 
>>>>>> have problem splitting the old Jacobian if they compute dF/dUdot and 
>>>>>> dF/dU explicitly. If dF/dUdot is too complicated to build, matrix-free 
>>>>>> is an alternative option.
>>>>>> 
>>>>>> While this proposal is somehow related to Barry's idea of having a 
>>>>>> TSSetMassMatrix() last year, I hope it provides more details for your 
>>>>>> information. Any of your comments and feedback would be appreciated.
>>>>>> 
>>>>>> Thanks,
>>>>>> Hong
>> 
>

Re: [petsc-dev] Proposed changes to TS API

Reply via email to