On Monday, 3 April 2017 02:00:53 CEST you wrote:
> On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi
<[email protected] <mailto:[email protected]>>
>
> wrote:
> > Hello,
> >
> > I have a project in mind and seek feedback.
> >
> > Disclaimer: I hope I am not abusing of this mailing list with this
idea.
> > If so, please ignore.
> >
> > As a thought experiment, and to have a bit of fun, I am currently
> > writing/thinking on writing, a small (modern) C++ wrapper around
PETSc.
> >
> > Premise: PETSc is awesome, I love it and use in many projects.
Sometimes I
> > am just not super comfortable writing C. (I know my idea goes against
> > PETSc's design philosophy).
> >
> > I know there are many around, and there is not really a need for this
> > (especially since PETSc has his own object-oriented style), but
there are
> > a
> > few things I would like to really include in this wrapper, that I
found
> > nowhere):
> > - I am currently only thinking about the Vector/Matrix/KSP/DM part
of the
> > Framework, there are many other cool things that PETSc does that I
do not
> > have the brainpower to consider those as well.
> > - expression templates (in my opinion this is where C++ shines): this
> > would replace all code bloat that a user might need with cool/easy
to read
> > expressions (this could increase the number of axpy-like routines);
> > - those expression templates should use SSE and AVX whenever
available;
> > - expressions like x += alpha * y should fall back to BLAS axpy (tough
> > sometimes this is not even faster than a simple loop);
>
> The idea for the above is not clear. Do you want templates
generating calls
> to BLAS? Or scalar code that operates on raw arrays with SSE/AVX?
> There is some advantage here of expanding the range of BLAS operations,
> which has been done to death by Liz Jessup and collaborators, but not
> that much.
Templates should generate scalar code operating on raw arrays using
SIMD. But
I can detect if you want to use axpbycz or gemv, and use the blas
implementation instead. I do not think there is a point in trying to
"beat"
BLAS. (Here a interesting point opens: I assume an efficient BLAS
implementation, but I am not so sure about how the different BLAS do
things
internally. I work from the assumption that we have a very well tuned
BLAS
implementation at our disposal).
>
> > - all calls to PETSc should be less verbose, more C++-like:
> > * for instance a VecGlobalToLocalBegin could return an empty
object that
> >
> > calls VecGlobalToLocalEnd when it is destroyed.
> >
> > * some cool idea to easily write GPU kernels.
>
> If you find a way to make this pay off it would be amazing, since
currently
> nothing but BLAS3 has a hope of mattering in this context.
>
> > - the idea would be to have safer routines (at compile time), by
means of
> > RAII etc.
> >
> > I aim for zero/near-zero/negligible overhead with full
optimization, for
> > that I include benchmarks and extensive test units.
> >
> > So my question is:
> > - anyone that would be interested (in the product/in developing)?
> > - anyone that has suggestions (maybe that what I have in mind is
> > nonsense)?
>
> I would suggest making a simple performance model that says what you
will
> do will have at least
> a 2x speed gain. Because anything less is not worth your time, and
> inevitably you will not get the
> whole multiplier. I am really skeptical that is possible with the above
> sketch.
That I will do as next steps for sure. But I also doubt this much of
will be
achievable in any case.
>
> Second, I would try to convince myself that what you propose would be
> simpler, in terms of lines of code,
> number of objects, number of concepts, etc. Right now, that is not
clear to
> me either.
Number of objects per se may not be smaller. I am more thinking about
reducing
lines of codes (verbosity), concepts and increase safety.
I have two examples I've been burnt with in the past:
- casting to void* to pass custom contexts to PETSc routines
- forgetting to call the corresponding XXXEnd after a call to XXXBegin
(PETSc notices that, ofc., but at runtime, and that might be too late).
Example: I can imagine that I need a Petsc's internal array. In this
case I
call VecGetArray. However I will inevitably foget to return the array to
PETSc. I could have my new VecArray returning an object that restores the
array
when it goes out of scope. I can also flag the function with
[[nodiscard]] to
prevent the user to destroy the returned object from the start.
>
> Baring that, maybe you can argue that new capabilities, such as the type
> flexibility described by Michael, are enabled. That
> would be the most convincing I think.
This would be very interesting indeed, but I see only two options:
- recompile PETSc twice
- manually implement all complex routines, which might be to much of a
task
>
> Thanks,
>
> Matt
Thanks for the feedback Matt.
>
> If you have read up to here, thanks.
On Mon, 3 Apr 2017 at 02:00 Matthew Knepley <[email protected]
<mailto:[email protected]>> wrote:
On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi
<[email protected] <mailto:[email protected]>> wrote:
Hello,
I have a project in mind and seek feedback.
Disclaimer: I hope I am not abusing of this mailing list with
this idea. If so, please ignore.
As a thought experiment, and to have a bit of fun, I am
currently writing/thinking on writing, a small (modern) C++
wrapper around PETSc.
Premise: PETSc is awesome, I love it and use in many projects.
Sometimes I am just not super comfortable writing C. (I know
my idea goes against PETSc's design philosophy).
I know there are many around, and there is not really a need
for this (especially since PETSc has his own object-oriented
style), but there are a few things I would like to really
include in this wrapper, that I found nowhere):
- I am currently only thinking about the Vector/Matrix/KSP/DM
part of the Framework, there are many other cool things that
PETSc does that I do not have the brainpower to consider those
as well.
- expression templates (in my opinion this is where C++
shines): this would replace all code bloat that a user might
need with cool/easy to read expressions (this could increase
the number of axpy-like routines);
- those expression templates should use SSE and AVX whenever
available;
- expressions like x += alpha * y should fall back to BLAS
axpy (tough sometimes this is not even faster than a simple loop);
The idea for the above is not clear. Do you want templates
generating calls to BLAS? Or scalar code that operates on raw
arrays with SSE/AVX?
There is some advantage here of expanding the range of BLAS
operations, which has been done to death by Liz Jessup and
collaborators, but not
that much.
- all calls to PETSc should be less verbose, more C++-like:
* for instance a VecGlobalToLocalBegin could return an empty
object that calls VecGlobalToLocalEnd when it is destroyed.
* some cool idea to easily write GPU kernels.
If you find a way to make this pay off it would be amazing, since
currently nothing but BLAS3 has a hope of mattering in this context.
- the idea would be to have safer routines(at compile time),
by means of RAII etc.
I aim for zero/near-zero/negligible overhead with full
optimization, for that I include benchmarks and extensive test
units.
So my question is:
- anyone that would be interested (in the product/in developing)?
- anyone that has suggestions (maybe that what I have in mind
is nonsense)?
I would suggest making a simple performance model that says what
you will do will have at least
a 2x speed gain. Because anything less is not worth your time, and
inevitably you will not get the
whole multiplier. I am really skeptical that is possible with the
above sketch.
Second, I would try to convince myself that what you propose would
be simpler, in terms of lines of code,
number of objects, number of concepts, etc. Right now, that is not
clear to me either.
Baring that, maybe you can argue that new capabilities, such as
the type flexibility described by Michael, are enabled. That
would be the most convincing I think.
Thanks,
Matt
If you have read up to here, thanks.
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to
which their experiments lead.
-- Norbert Wiener