> On May 11, 2018, at 6:20 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > >> On May 11, 2018, at 8:03 AM, Stefano Zampini <stefano.zamp...@gmail.com> >> wrote: >> >> I don’t think changing the current TS API is best approach. >> >> Obtaining separate Jacobians is a need for adjoints and tangent linear >> models only. >> This is how I implemented it in stefano_zampini/feature-continuousadjoint >> https://bitbucket.org/petsc/petsc/src/7203629c61cbeced536ed8ba1dd2ef85ffb89e8f/src/ts/interface/tssplitjac.c#lines-48 >> >> Note that, instead of requiring the user to call PetscObjectComposeFunction, >> we can use a function pointer and have TSSetComputeSplitJacobians > > So you are proposing keeping TSSetIFunction and TSSetIJacobian and ADDING a > new API TSSetComputeSplitJacobians() and it that is not provided calling > TSComputeIJacobian() twice with different shifts (which is definitely not > efficient and is what Hong does also). >
Why is it inefficient? If you need BOTH dF/dUdot and dF/dU, you need two different assemblies even if we change the API. Note that the physics is needed to evaluate dF/du, but usually it’s not for dF/dUdot. And the MatAXPY is with SAME_NONZERO_PATTERN, so, basically no time. The only difference is one extra physics evaluation (that can be expensive). However, advanced users that are aware of that, can provide their specialized version of ComputeJacobians. > Barry > >> >> >> >>> On May 11, 2018, at 3:20 PM, Jed Brown <j...@jedbrown.org> wrote: >>> >>> "Smith, Barry F." <bsm...@mcs.anl.gov> writes: >>> >>>>> On May 10, 2018, at 4:12 PM, Jed Brown <j...@jedbrown.org> wrote: >>>>> >>>>> "Zhang, Hong" <hongzh...@anl.gov> writes: >>>>> >>>>>> Dear PETSc folks, >>>>>> >>>>>> Current TS APIs (IFunction/IJacobian+RHSFunction/RHSJacobian) were >>>>>> designed for the fully implicit formulation F(t,U,Udot) = G(t,U). >>>>>> Shampine's paper >>>>>> (https://www.sciencedirect.com/science/article/pii/S0377042706004110?via%3Dihub<https://www.sciencedirect.com/science/article/pii/S0377042706004110?via=ihub>) >>>>>> explains some reasoning behind it. >>>>>> >>>>>> Our formulation is general enough to cover all the following common cases >>>>>> >>>>>> * Udot = G(t,U) (classic ODE) >>>>>> * M Udot = G(t,U) (ODEs/DAEs for mechanical and electronic systems) >>>>>> * M(t,U) Udot = G(t,U) (PDEs) >>>>>> >>>>>> Yet the TS APIs provide the capability to solve both simple problems and >>>>>> complicated problems. However, we are not doing well to make TS easy to >>>>>> use and efficient especially for simple problems. Over the years, we >>>>>> have clearly seen the main drawbacks including: >>>>>> 1. The shift parameter exposed in IJacobian is terribly confusing, >>>>>> especially to new users. Also it is not conceptually straightforward >>>>>> when using AD or finite differences on IFunction to approximate >>>>>> IJacobian. >>>>> >>>>> What isn't straightforward about AD or FD on the IFunction? That one >>>>> bit of chain rule? >>>>> >>>>>> 2. It is difficult to switch from fully implicit to fully explicit. >>>>>> Users cannot use explicit methods when they provide IFunction/IJacobian. >>>>> >>>>> This is a real issue, but it's extremely common for PDE to have boundary >>>>> conditions enforced as algebraic constraints, thus yielding a DAE. >>>>> >>>>>> 3. The structure of mass matrix is completely invisible to TS. This >>>>>> means giving up all the opportunities to improve efficiency. For >>>>>> example, when M is constant or weekly dependent on U, we might not want >>>>>> to evaluate/update it every time IJacobian is called. If M is diagonal, >>>>>> the Jacobian can be shifted more efficiently than just using MatAXPY(). >>>>> >>>>> I don't understand >>>>> >>>>>> 4. Reshifting the Jacobian is unnecessarily expensive and sometimes >>>>>> buggy. >>>>> >>>>> Why is "reshifting" needed? After a step is rejected and when the step >>>>> size changes for a linear constant operator? >>>>> >>>>>> Consider the scenario below. >>>>>> shift = a; >>>>>> TSComputeIJacobian() >>>>>> shift = b; >>>>>> TSComputeIJacobian() // with the same U and Udot as last call >>>>>> Changing the shift parameter requires the Jacobian function to be >>>>>> evaluated again. If users provide only RHSJacobian, the Jacobian will >>>>>> not be updated/reshifted in the second call because >>>>>> TSComputeRHSJacobian() finds out that U has not been changed. This issue >>>>>> is fixable by adding more logic into the already convoluted >>>>>> implementation of TSComputeIJacobian(), but the intention here is to >>>>>> illustrate the cumbersomeness of current IJacobian and the growing >>>>>> complications in TSComputeIJacobian() that IJacobian causes. >>>>>> >>>>>> So I propose that we have two separate matrices dF/dUdot and dF/dU, and >>>>>> remove the shift parameter from IJacobian. dF/dU will be calculated by >>>>>> IJacobian; dF/dUdot will be calculated by a new callback function and >>>>>> default to an identity matrix if it is not provided by users. Then the >>>>>> users do not need to assemble the shifted Jacobian since TS will handle >>>>>> the shifting whenever needed. And knowing the structure of dF/dUdot (the >>>>>> mass matrix), TS will become more flexible. In particular, we will have >>>>>> >>>>>> * easy access to the unshifted Jacobian dF/dU (this simplifies the >>>>>> adjoint implementation a lot), >>>>> >>>>> How does this simplify the adjoint? >>>>> >>>>>> * plenty of opportunities to optimize TS when the mass matrix is >>>>>> diagonal or constant or weekly dependent on U (which accounts for almost >>>>>> all use cases in practice), >>>>> >>>>> But longer critical path, >>>> >>>> What do you mean by longer critical path? >>> >>> Create Mass (dF/dUdot) matrix, call MatAssembly, create dF/dU, call >>> MatAssembly, call MatAXPY (involves another MatAssembly unless >>> SAME_NONZERO_PATTERN). That's a long sequence for what could be one >>> MatAssembly. Also note that geometric setup for elements is usually >>> done in each element loop. For simple physics, this is way more >>> expensive than the physics (certainly the case for LibMesh and Deal.II). >>> >>>>> more storage required, and more data motion. >>>> >>>> The extra storage needed is related to the size of the mass matrix >>>> correct? And the extra data motion is related to the size of the mass >>>> matrix correct? >>> >>> Yes, which is the same as the stiffness matrix for finite element methods. >>> >>>> Is the extra work coming from a needed call to MatAXPY (to combine the >>>> scaled mass matrix with the Jacobian) in Hong's approach? While, in >>>> theory, the user can avoid the MatAXPY in the current code if they have >>>> custom code that assembles directly the scaled mass matrix and Jacobian? >>>> But surely most users would not write such custom code and would >>>> themselves keep a copy of the mass matrix (likely constant) and use >>>> MatAXPY() to combine the copy of the mass matrix with the Jacobian they >>>> compute at each timestep/stage? Or am I missing something? >>> >>> It's better (and at least as convenient) to write code that assembles >>> into the mass matrix (at the element scale if you want to amortize mesh >>> traversal, but can also be a new traversal without needing extra >>> MatAssembly). Then instead of MatAXPY(), you call the code that >>> ADD_VALUES the mass part. I think storing the mass matrix away >>> somewhere is a false economy in almost all cases. >>> >>> There is also the issue that matrix-free preconditioning is much more >>> confusing with the new proposed scheme. As it is now, the matrix needed >>> by the solver is specified and the user can choose how to approximate >>> it. If only the pieces are specified, then a PCShell will need to >>> understand the result of a MatAXPY with shell matrices. >>> >>>>> And if the mass matrix is simple, won't it take a very small fraction of >>>>> time, thus have little gains from "optimizing it"? >>>> >>>> Is the Function approach only theoretically much more efficient than >>>> Hong's approach when the mass matrix is nontrivial? That is the mass >>>> matrix has a nonzero structure similar to the Jacobian? >>> >>> The extra MatAssembly is called even for MatAXPY with >>> SUBSET_NONZERO_PATTERN. But apart from strong scaling concerns like >>> that extra communication (which could be optimized, but there are >>> several formats to optimize) any system should be sufficiently fast if >>> the mass matrix is trivial because that means it has much less work than >>> the dF/dU matrix. >>> >>>>>> * easy conversion from fully implicit to fully explicit, >>>>>> * an IJacobian without shift parameter that is easy to understand and >>>>>> easy to port to other software. >>>>> >>>>> Note that even CVODE has an interface similar to PETSc; e.g., gamma >>>>> parameter in CVSpilsPrecSetupFn. >>>>> >>>>>> Regarding the changes on the user side, most IJacobian users should not >>>>>> have problem splitting the old Jacobian if they compute dF/dUdot and >>>>>> dF/dU explicitly. If dF/dUdot is too complicated to build, matrix-free >>>>>> is an alternative option. >>>>>> >>>>>> While this proposal is somehow related to Barry's idea of having a >>>>>> TSSetMassMatrix() last year, I hope it provides more details for your >>>>>> information. Any of your comments and feedback would be appreciated. >>>>>> >>>>>> Thanks, >>>>>> Hong >> >