This whole thread has been really interesting. I'm interested in Barry's suggestion, and I know a handful of people not on the list who would also be interested. I've got a couple of questions/responses to the MatSetValues, point-wise physics, and "support tools" issues that were raised.
Did you guys settle the issue that Jed brought up about about the Python Global Interpreter Lock (GIL)? There was some discussion of it at SciPy this summer, but I didn't come away feeling like anybody had an immediate solution or that Guido was in favor of eliminating the GIL anytime soon. Are you guys assuming that the "worker threads" for the multiple cores/GPU's are going to be set up by C code (hand- or machine-generated)? It seems to me that the MatSetValues discussion moved on from the threading issue (since you can't call MatSetValues efficiently in python anyway) to the more general issue of how PETSc is going to explicitly support shared memory parallelism. I interpret the last interchange as implying that the solution is another level of domain decomposition that would be hidden from the mat assembly loop level (i.e. users aren't going to have to write their code like the "thread part" is their parallel subdomain). Is that a correct interpretation? Something along the lines of additional args to MatCreate that indicates how many threads to use per subdomain and some OpenMPI style instructions to parallelize the assembly loop? On the issue of the point-wise physics (e.g. quadrature point routines, Riemann solvers, ...). It seems like this is a long-standing bottleneck for object-orient approaches to PDE's going back to the OO Fem research in the 80s and 90s. The SciPy group that met to talk about PDE's spent some time on it, partly related to femhub project. I agree that a code generation approach seems to be the only way forward. Ondrej Certik from the femhub project has written a symbolic math module for python, and we talked some about using this to represent the physics in a way that can be used for code generation. I'm not sure what progress he's made. At the end of the day you've got to allow people to write code for physics without knowledge of the assembly processes while still injecting that code into the inner loop, otherwise you'll approach the speed of monolithic fortran codes. One issue that you might also want to consider with multi-language approaches is the configuration and build tools. I agree that he debugging support is a major issue, but what has taken far more of my time is the lack of cross-platform configuration and build tools for multi-language software. We have looked at CMake and Scons but have had to stick with a pretty ugly mixture of python scripts of python and make. I feel like the other scientific computing packages like Sage and EPD are successfully growing because they distribute binaries that include all dependencies, but I don't see that being feasible for HPC software anytime soon. It seems like to really be successful you may have to rely on a full featured and fully documented tool like Scons is aiming at being, otherwise you'll end up with an approach that works for you (because you wrote it) but not very well for others trying to use the new PETSc. We have a small group at that US Army ERDC that has been developing a python FEM framework for multiscale/multiphysics modeling that tries to break coupling between the physics and the FEM methods (i.e. we try to support CG,DG, and non-conforming elements with variable order and element topologies running from the same description of the physics). The code uses our own wrappers to petsc and mpi, but only because we started before mpi4py and petsc4py existed. We have a memory intensive physics API based on computing and storing quadrature point values of PDE coefficients and numerical fluxes and a residual and jacobian assembly processes along the lines of Matt's first approach of storing element residuals and jacobians. We also have handwritten "optimized" codes that collapse all the loops and eliminate storage for direct residual and jacobian assembly. We did that last step out of necessity and as a precursor to working on a code generation approach (it's what we want the code generator to write). It also has the side effect of providing a low level API for people who want to write traditional monolithic fortran codes for residual and jacobian assembly. Our approach for the "user" or more correctly the "physics developer" is 1) prototype physics in python 2) "vectorize" the physics by rewriting in C/C++/Fortran using swig, f2py, or hand written wrappers and 3) take the inner loop of 2 and build monolithic residual and jacobian routines. This is obviously not satisfactory and not maintainable in the long-run but we are solving "real" nonlinear multi- physics problems in 3D with it (e.g. incompressible two-phase flow, two-phase flow in porous media, shallow water equations, and a few other models). Recent profiles have put about 95% of the compute time inside KSPSolve, but we've got a lot of work to do on better preconditioning and tolerance selection before that number provides reliable guidance. Anyway, If you guys go forward with Barry's suggestion I'd be interested in seeing if there's some way to work out some inter-agency collaboration in addition to the good academic collaboration that you already have. Chris On Dec 6, 2009, at 9:43 AM, Jed Brown wrote: > On Sat, 5 Dec 2009 14:23:45 -0600, Matthew Knepley > <knepley at gmail.com> wrote: >> We do impose some of our process on users already, such as CHKERRQ >> and >> PetscMalloc. > > CHKERRQ is still optional, it's just recommended in order to get a > decent trace. It can't be used, for example, for top-level calls > within > my VisIt plugin because those functions necessarily return something > else (I have an alternative which checks the return code, generates > debugging messages, possibly tries to deallocate local resources, and > throws the appropriate exception). In C99 projects, I use an > alternate > macro that uses __func__ instead of __FUNCT__. > > There is PetscMallocSet if the user doesn't want to use PetscMalloc, > but > users are rarely required to directly interact with > PetscMalloc/PetscFree anyway since most arrays are copied or managed > by > an object. > > Jed