Hi Filippo,
to recompile Petsc twice is easy.
The difficulty is that in both libraries there will be the same symbols for double and double complex functions.
If they were a part of a C++ namespaces, then it would be easier.
Michael.

On 04/03/2017 12:45 PM, Filippo Leonardi wrote:

On Monday, 3 April 2017 02:00:53 CEST you wrote:

> On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi <[email protected] <mailto:[email protected]>>

>

> wrote:

> > Hello,

> >

> > I have a project in mind and seek feedback.

> >

> > Disclaimer: I hope I am not abusing of this mailing list with this idea.

> > If so, please ignore.

> >

> > As a thought experiment, and to have a bit of fun, I am currently

> > writing/thinking on writing, a small (modern) C++ wrapper around PETSc.

> >

> > Premise: PETSc is awesome, I love it and use in many projects. Sometimes I

> > am just not super comfortable writing C. (I know my idea goes against

> > PETSc's design philosophy).

> >

> > I know there are many around, and there is not really a need for this

> > (especially since PETSc has his own object-oriented style), but there are

> > a

> > few things I would like to really include in this wrapper, that I found

> > nowhere):

> > - I am currently only thinking about the Vector/Matrix/KSP/DM part of the

> > Framework, there are many other cool things that PETSc does that I do not

> > have the brainpower to consider those as well.

> > - expression templates (in my opinion this is where C++ shines): this

> > would replace all code bloat that a user might need with cool/easy to read

> > expressions (this could increase the number of axpy-like routines);

> > - those expression templates should use SSE and AVX whenever available;

> > - expressions like x += alpha * y should fall back to BLAS axpy (tough

> > sometimes this is not even faster than a simple loop);

>

> The idea for the above is not clear. Do you want templates generating calls

> to BLAS? Or scalar code that operates on raw arrays with SSE/AVX?

> There is some advantage here of expanding the range of BLAS operations,

> which has been done to death by Liz Jessup and collaborators, but not

> that much.


Templates should generate scalar code operating on raw arrays using SIMD. But

I can detect if you want to use axpbycz or gemv, and use the blas

implementation instead. I do not think there is a point in trying to "beat"

BLAS. (Here a interesting point opens: I assume an efficient BLAS

implementation, but I am not so sure about how the different BLAS do things

internally. I work from the assumption that we have a very well tuned BLAS

implementation at our disposal).


>

> > - all calls to PETSc should be less verbose, more C++-like:

> > * for instance a VecGlobalToLocalBegin could return an empty object that

> >

> > calls VecGlobalToLocalEnd when it is destroyed.

> >

> > * some cool idea to easily write GPU kernels.

>

> If you find a way to make this pay off it would be amazing, since currently

> nothing but BLAS3 has a hope of mattering in this context.

>

> > - the idea would be to have safer routines (at compile time), by means of

> > RAII etc.

> >

> > I aim for zero/near-zero/negligible overhead with full optimization, for

> > that I include benchmarks and extensive test units.

> >

> > So my question is:

> > - anyone that would be interested (in the product/in developing)?

> > - anyone that has suggestions (maybe that what I have in mind is

> > nonsense)?

>

> I would suggest making a simple performance model that says what you will

> do will have at least

> a 2x speed gain. Because anything less is not worth your time, and

> inevitably you will not get the

> whole multiplier. I am really skeptical that is possible with the above

> sketch.


That I will do as next steps for sure. But I also doubt this much of will be

achievable in any case.


>

> Second, I would try to convince myself that what you propose would be

> simpler, in terms of lines of code,

> number of objects, number of concepts, etc. Right now, that is not clear to

> me either.


Number of objects per se may not be smaller. I am more thinking about reducing

lines of codes (verbosity), concepts and increase safety.


I have two examples I've been burnt with in the past:

- casting to void* to pass custom contexts to PETSc routines

- forgetting to call the corresponding XXXEnd after a call to XXXBegin

(PETSc notices that, ofc., but at runtime, and that might be too late).


Example: I can imagine that I need a Petsc's internal array. In this case I

call VecGetArray. However I will inevitably foget to return the array to

PETSc. I could have my new VecArray returning an object that restores the

array

when it goes out of scope. I can also flag the function with [[nodiscard]] to

prevent the user to destroy the returned object from the start.


>

> Baring that, maybe you can argue that new capabilities, such as the type

> flexibility described by Michael, are enabled. That

> would be the most convincing I think.


This would be very interesting indeed, but I see only two options:

- recompile PETSc twice

- manually implement all complex routines, which might be to much of a task


>

> Thanks,

>

> Matt


Thanks for the feedback Matt.


>

> If you have read up to here, thanks.




On Mon, 3 Apr 2017 at 02:00 Matthew Knepley <[email protected] <mailto:[email protected]>> wrote:

    On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi
    <[email protected] <mailto:[email protected]>> wrote:


        Hello,

        I have a project in mind and seek feedback.

        Disclaimer: I hope I am not abusing of this mailing list with
        this idea. If so, please ignore.

        As a thought experiment, and to have a bit of fun, I am
        currently writing/thinking on writing, a small (modern) C++
        wrapper around PETSc.

        Premise: PETSc is awesome, I love it and use in many projects.
        Sometimes I am just not super comfortable writing C. (I know
        my idea goes against PETSc's design philosophy).

        I know there are many around, and there is not really a need
        for this (especially since PETSc has his own object-oriented
        style), but there are a few things I would like to really
        include in this wrapper, that I found nowhere):
        - I am currently only thinking about the Vector/Matrix/KSP/DM
        part of the Framework, there are many other cool things that
        PETSc does that I do not have the brainpower to consider those
        as well.
        - expression templates (in my opinion this is where C++
        shines): this would replace all code bloat that a user might
        need with cool/easy to read expressions (this could increase
        the number of axpy-like routines);
        - those expression templates should use SSE and AVX whenever
        available;
        - expressions like x += alpha * y should fall back to BLAS
        axpy (tough sometimes this is not even faster than a simple loop);


    The idea for the above is not clear. Do you want templates
    generating calls to BLAS? Or scalar code that operates on raw
    arrays with SSE/AVX?
    There is some advantage here of expanding the range of BLAS
    operations, which has been done to death by Liz Jessup and
    collaborators, but not
    that much.

        - all calls to PETSc should be less verbose, more C++-like:
          * for instance a VecGlobalToLocalBegin could return an empty
        object that calls VecGlobalToLocalEnd when it is destroyed.
          * some cool idea to easily write GPU kernels.


    If you find a way to make this pay off it would be amazing, since
    currently nothing but BLAS3 has a hope of mattering in this context.

        - the idea would be to have safer routines(at compile time),
        by means of RAII etc.

        I aim for zero/near-zero/negligible overhead with full
        optimization, for that I include benchmarks and extensive test
        units.

        So my question is:
        - anyone that would be interested (in the product/in developing)?
        - anyone that has suggestions (maybe that what I have in mind
        is nonsense)?


    I would suggest making a simple performance model that says what
    you will do will have at least
    a 2x speed gain. Because anything less is not worth your time, and
    inevitably you will not get the
    whole multiplier. I am really skeptical that is possible with the
    above sketch.

    Second, I would try to convince myself that what you propose would
    be simpler, in terms of lines of code,
    number of objects, number of concepts, etc. Right now, that is not
    clear to me either.

    Baring that, maybe you can argue that new capabilities, such as
    the type flexibility described by Michael, are enabled. That
    would be the most convincing I think.

      Thanks,

         Matt

        If you have read up to here, thanks.




-- What most experimenters take for granted before they begin their
    experiments is infinitely more interesting than any results to
    which their experiments lead.
    -- Norbert Wiener


--
Michael Povolotskyi, PhD
Research Assistant Professor
Network for Computational Nanotechnology
Hall for Discover and Learning Research, Room 441
207 South Martin Jischke Drive
West Lafayette, IN 47907
Phone (765) 4949396

Reply via email to