On Monday, 3 April 2017 02:00:53 CEST you wrote: > On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi <[email protected]>
> > wrote: > > Hello, > > > > I have a project in mind and seek feedback. > > > > Disclaimer: I hope I am not abusing of this mailing list with this idea. > > If so, please ignore. > > > > As a thought experiment, and to have a bit of fun, I am currently > > writing/thinking on writing, a small (modern) C++ wrapper around PETSc. > > > > Premise: PETSc is awesome, I love it and use in many projects. Sometimes I > > am just not super comfortable writing C. (I know my idea goes against > > PETSc's design philosophy). > > > > I know there are many around, and there is not really a need for this > > (especially since PETSc has his own object-oriented style), but there are > > a > > few things I would like to really include in this wrapper, that I found > > nowhere): > > - I am currently only thinking about the Vector/Matrix/KSP/DM part of the > > Framework, there are many other cool things that PETSc does that I do not > > have the brainpower to consider those as well. > > - expression templates (in my opinion this is where C++ shines): this > > would replace all code bloat that a user might need with cool/easy to read > > expressions (this could increase the number of axpy-like routines); > > - those expression templates should use SSE and AVX whenever available; > > - expressions like x += alpha * y should fall back to BLAS axpy (tough > > sometimes this is not even faster than a simple loop); > > The idea for the above is not clear. Do you want templates generating calls > to BLAS? Or scalar code that operates on raw arrays with SSE/AVX? > There is some advantage here of expanding the range of BLAS operations, > which has been done to death by Liz Jessup and collaborators, but not > that much. Templates should generate scalar code operating on raw arrays using SIMD. But I can detect if you want to use axpbycz or gemv, and use the blas implementation instead. I do not think there is a point in trying to "beat" BLAS. (Here a interesting point opens: I assume an efficient BLAS implementation, but I am not so sure about how the different BLAS do things internally. I work from the assumption that we have a very well tuned BLAS implementation at our disposal). > > > - all calls to PETSc should be less verbose, more C++-like: > > * for instance a VecGlobalToLocalBegin could return an empty object that > > > > calls VecGlobalToLocalEnd when it is destroyed. > > > > * some cool idea to easily write GPU kernels. > > If you find a way to make this pay off it would be amazing, since currently > nothing but BLAS3 has a hope of mattering in this context. > > > - the idea would be to have safer routines (at compile time), by means of > > RAII etc. > > > > I aim for zero/near-zero/negligible overhead with full optimization, for > > that I include benchmarks and extensive test units. > > > > So my question is: > > - anyone that would be interested (in the product/in developing)? > > - anyone that has suggestions (maybe that what I have in mind is > > nonsense)? > > I would suggest making a simple performance model that says what you will > do will have at least > a 2x speed gain. Because anything less is not worth your time, and > inevitably you will not get the > whole multiplier. I am really skeptical that is possible with the above > sketch. That I will do as next steps for sure. But I also doubt this much of will be achievable in any case. > > Second, I would try to convince myself that what you propose would be > simpler, in terms of lines of code, > number of objects, number of concepts, etc. Right now, that is not clear to > me either. Number of objects per se may not be smaller. I am more thinking about reducing lines of codes (verbosity), concepts and increase safety. I have two examples I've been burnt with in the past: - casting to void* to pass custom contexts to PETSc routines - forgetting to call the corresponding XXXEnd after a call to XXXBegin (PETSc notices that, ofc., but at runtime, and that might be too late). Example: I can imagine that I need a Petsc's internal array. In this case I call VecGetArray. However I will inevitably foget to return the array to PETSc. I could have my new VecArray returning an object that restores the array when it goes out of scope. I can also flag the function with [[nodiscard]] to prevent the user to destroy the returned object from the start. > > Baring that, maybe you can argue that new capabilities, such as the type > flexibility described by Michael, are enabled. That > would be the most convincing I think. This would be very interesting indeed, but I see only two options: - recompile PETSc twice - manually implement all complex routines, which might be to much of a task > > Thanks, > > Matt Thanks for the feedback Matt. > > If you have read up to here, thanks. On Mon, 3 Apr 2017 at 02:00 Matthew Knepley <[email protected]> wrote: > On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi <[email protected]> > wrote: > > > Hello, > > I have a project in mind and seek feedback. > > Disclaimer: I hope I am not abusing of this mailing list with this idea. > If so, please ignore. > > As a thought experiment, and to have a bit of fun, I am currently > writing/thinking on writing, a small (modern) C++ wrapper around PETSc. > > Premise: PETSc is awesome, I love it and use in many projects. Sometimes I > am just not super comfortable writing C. (I know my idea goes against > PETSc's design philosophy). > > I know there are many around, and there is not really a need for this > (especially since PETSc has his own object-oriented style), but there are a > few things I would like to really include in this wrapper, that I found > nowhere): > - I am currently only thinking about the Vector/Matrix/KSP/DM part of the > Framework, there are many other cool things that PETSc does that I do not > have the brainpower to consider those as well. > - expression templates (in my opinion this is where C++ shines): this > would replace all code bloat that a user might need with cool/easy to read > expressions (this could increase the number of axpy-like routines); > - those expression templates should use SSE and AVX whenever available; > - expressions like x += alpha * y should fall back to BLAS axpy (tough > sometimes this is not even faster than a simple loop); > > > The idea for the above is not clear. Do you want templates generating > calls to BLAS? Or scalar code that operates on raw arrays with SSE/AVX? > There is some advantage here of expanding the range of BLAS operations, > which has been done to death by Liz Jessup and collaborators, but not > that much. > > > - all calls to PETSc should be less verbose, more C++-like: > * for instance a VecGlobalToLocalBegin could return an empty object that > calls VecGlobalToLocalEnd when it is destroyed. > * some cool idea to easily write GPU kernels. > > > If you find a way to make this pay off it would be amazing, since > currently nothing but BLAS3 has a hope of mattering in this context. > > > - the idea would be to have safer routines (at compile time), by means of > RAII etc. > > I aim for zero/near-zero/negligible overhead with full optimization, for > that I include benchmarks and extensive test units. > > So my question is: > - anyone that would be interested (in the product/in developing)? > - anyone that has suggestions (maybe that what I have in mind is nonsense)? > > > I would suggest making a simple performance model that says what you will > do will have at least > a 2x speed gain. Because anything less is not worth your time, and > inevitably you will not get the > whole multiplier. I am really skeptical that is possible with the above > sketch. > > Second, I would try to convince myself that what you propose would be > simpler, in terms of lines of code, > number of objects, number of concepts, etc. Right now, that is not clear > to me either. > > Baring that, maybe you can argue that new capabilities, such as the type > flexibility described by Michael, are enabled. That > would be the most convincing I think. > > Thanks, > > Matt > > If you have read up to here, thanks. > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener >
