Re: [C++] Private implementations and virtual interfaces

2019-07-28 Thread Uwe L. Korn
Building CI that detects ABI breakage is not hard. There is a closed PR in the 
arrow repo from me that does exactly this using abi-complicance-checker.

I understand that we will not be able to provide ABI stability on all Arrow 
subprojects but having it for a core would be really great. This would allow 
easy upgrading of adaptors that connect to Arrow but don't actually work with 
the resulting structures (this can be database connectors and file format 
implementations).

Uwe

On Sun, Jul 28, 2019, at 10:43 AM, Antoine Pitrou wrote:
> 
> Le 28/07/2019 à 01:49, Wes McKinney a écrit :
> > On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn  wrote:
> >>
> >> The PIMPL is a thing I would trade a bit of performance as it brings ABI 
> >> stability. This is something that will help us making Arrow usage in 
> >> thirdparty code much simpler.
> >>
> > 
> > I question whether ABI stability (at the level of the shared library
> > ABI version) will ever be practical in this project. In the case of
> > NumPy, there are very few C data structures that could actually
> > change. In this library we have a great deal more data structures, I
> > would guess 100s of distinct objects taking into account each C++
> > class in the library. It's one thing to be API forward compatible from
> > major version to major version (with warnings to allow for graceful
> > deprecations) but such forward compatibility strategies are not so
> > readily available when talking about the composition of C++ classes
> > (considering that virtual function tables are part of the ABI).
> > 
> > In any case, as a result of this, I'm not comfortable basing technical
> > decisions on the basis of ABI stability considerations.
> 
> We could at some point define a "stable ABI subset" (for example the
> core classes Array, ArrayData, etc.).  But I'm not sure whether that
> would be sufficient for users.
> 
> Also in any case it would need someone to maintain the regular toolkit
> and continuous integration hooks to prevent ABI breakage.  Otherwise
> it's pointless promising something that we can't efficiently monitor.
> 
> Regards
> 
> Antoine.
>


Re: [C++] Private implementations and virtual interfaces

2019-07-28 Thread Antoine Pitrou


Le 28/07/2019 à 01:49, Wes McKinney a écrit :
> On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn  wrote:
>>
>> The PIMPL is a thing I would trade a bit of performance as it brings ABI 
>> stability. This is something that will help us making Arrow usage in 
>> thirdparty code much simpler.
>>
> 
> I question whether ABI stability (at the level of the shared library
> ABI version) will ever be practical in this project. In the case of
> NumPy, there are very few C data structures that could actually
> change. In this library we have a great deal more data structures, I
> would guess 100s of distinct objects taking into account each C++
> class in the library. It's one thing to be API forward compatible from
> major version to major version (with warnings to allow for graceful
> deprecations) but such forward compatibility strategies are not so
> readily available when talking about the composition of C++ classes
> (considering that virtual function tables are part of the ABI).
> 
> In any case, as a result of this, I'm not comfortable basing technical
> decisions on the basis of ABI stability considerations.

We could at some point define a "stable ABI subset" (for example the
core classes Array, ArrayData, etc.).  But I'm not sure whether that
would be sufficient for users.

Also in any case it would need someone to maintain the regular toolkit
and continuous integration hooks to prevent ABI breakage.  Otherwise
it's pointless promising something that we can't efficiently monitor.

Regards

Antoine.


Re: [C++] Private implementations and virtual interfaces

2019-07-28 Thread Antoine Pitrou


Hi,

Le 27/07/2019 à 21:04, Wes McKinney a écrit :
> * My understanding is that the PIMPL pattern will perform better for
> non-virtual functions that are called a lot. It'd be helpful to know
> the magnitude of the performance difference

Modern CPUs have indirect branch predictors.  If an indirect branch
(e.g. virtual function call) always resolves to the same target address,
it should have very good performance.

My main issue with the Pimpl pattern is the amount of boilerplate it
requires.  One workaround is to use a "half-pimpl", i.e. keep public API
implementations in the main class, but put private helpers in the pimpl.
See e.g. MockFileSystem vs. MockFileSystem::Impl here:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/mockfs.cc#L217

That's not always convenient though.

Regards

Antoine.


> * Complex inheritance patterns may require use of virtual inheritance,
> which can create a burden for downstream users (e.g. they may have to
> use dynamic_cast to convert between types in the class hierarchy)
> 
> I'm far from a C++ expert so I'm always learning as time goes on, but
> hopefully other C++ developers have thoughts about this and we can
> keep an eye out in code reviews for cases where a simpler
> implementation may suffice
> 
> - Wes
> 


Re: [C++] Private implementations and virtual interfaces

2019-07-27 Thread Wes McKinney
On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn  wrote:
>
> The PIMPL is a thing I would trade a bit of performance as it brings ABI 
> stability. This is something that will help us making Arrow usage in 
> thirdparty code much simpler.
>

I question whether ABI stability (at the level of the shared library
ABI version) will ever be practical in this project. In the case of
NumPy, there are very few C data structures that could actually
change. In this library we have a great deal more data structures, I
would guess 100s of distinct objects taking into account each C++
class in the library. It's one thing to be API forward compatible from
major version to major version (with warnings to allow for graceful
deprecations) but such forward compatibility strategies are not so
readily available when talking about the composition of C++ classes
(considering that virtual function tables are part of the ABI).

In any case, as a result of this, I'm not comfortable basing technical
decisions on the basis of ABI stability considerations.

> Simple updates when an API was only extended but the ABI is intact is a great 
> ease on the Arrow consumer side. I know that this is a bit more hassle on the 
> developer side but it's something I really love with NumPy. It's so much 
> simpler to do a version upgrade than with an ABI breaking library such as 
> Arrow.
>
> Uwe
>
> > Am 27.07.2019 um 22:57 schrieb Jed Brown :
> >
> > Wes McKinney  writes:
> >
> >> The abstract/all-virtual base has some benefits:
> >>
> >> * No need to implement "forwarding" methods to the private implementation
> >> * Do not have to declare "friend" classes in the header for some cases
> >> where other classes need to access the methods of a private
> >> implementation
> >> * Implementation symbols do not need to be exported in DLLs with an
> >> *_EXPORT macro
> >>
> >> There are some drawbacks, or cases where this method cannot be applied, 
> >> though:
> >>
> >> * An implementation of some other abstract interface which needs to
> >> appear in a public header may not be able to use this approach.
> >> * My understanding is that the PIMPL pattern will perform better for
> >> non-virtual functions that are called a lot. It'd be helpful to know
> >> the magnitude of the performance difference
> >> * Complex inheritance patterns may require use of virtual inheritance,
> >> which can create a burden for downstream users (e.g. they may have to
> >> use dynamic_cast to convert between types in the class hierarchy)
> >
> > I would add these two points, which may or may not be a significant
> > concern to you:
> >
> > * When you add new methods to the abstract virtual model, you change the
> >  ABI [1] and therefore need to recompile client code.  This has many
> >  consequences for distribution.
> >
> > * PIMPL gives a well-defined place for input validation and setting
> >  debugger breakpoints even when you don't know which implementation
> >  will be used.
> >
> >
> > [1] The ABI changes because code to index into the vtable is inlined at
> > the call site.  Adding to your example
> >
> >  void foo(VirtualType ) {
> >obj.Method1();
> >// any other line to suppress tail call
> >  }
> >
> > produces assembly like
> >
> >  movrax,QWORD PTR [rdi] ; load vtable for obj
> >  call   QWORD PTR [rax+0x10]; 0x10 is offset into vtable
> >
> > A different method will use a different offset.  If you add a method,
> > offsets of existing methods may change.  With PIMPL, the equivalent
> > indexing code resides in your library instead of client code, and yields
> > a static (or PLT) call resolved by the linker:
> >
> >  call_ZN9PIMPLType7Method1Ev@PLT
>


Re: [C++] Private implementations and virtual interfaces

2019-07-27 Thread Uwe L. Korn
The PIMPL is a thing I would trade a bit of performance as it brings ABI 
stability. This is something that will help us making Arrow usage in thirdparty 
code much simpler.

Simple updates when an API was only extended but the ABI is intact is a great 
ease on the Arrow consumer side. I know that this is a bit more hassle on the 
developer side but it's something I really love with NumPy. It's so much 
simpler to do a version upgrade than with an ABI breaking library such as Arrow.

Uwe

> Am 27.07.2019 um 22:57 schrieb Jed Brown :
> 
> Wes McKinney  writes:
> 
>> The abstract/all-virtual base has some benefits:
>> 
>> * No need to implement "forwarding" methods to the private implementation
>> * Do not have to declare "friend" classes in the header for some cases
>> where other classes need to access the methods of a private
>> implementation
>> * Implementation symbols do not need to be exported in DLLs with an
>> *_EXPORT macro
>> 
>> There are some drawbacks, or cases where this method cannot be applied, 
>> though:
>> 
>> * An implementation of some other abstract interface which needs to
>> appear in a public header may not be able to use this approach.
>> * My understanding is that the PIMPL pattern will perform better for
>> non-virtual functions that are called a lot. It'd be helpful to know
>> the magnitude of the performance difference
>> * Complex inheritance patterns may require use of virtual inheritance,
>> which can create a burden for downstream users (e.g. they may have to
>> use dynamic_cast to convert between types in the class hierarchy)
> 
> I would add these two points, which may or may not be a significant
> concern to you:
> 
> * When you add new methods to the abstract virtual model, you change the
>  ABI [1] and therefore need to recompile client code.  This has many
>  consequences for distribution.
> 
> * PIMPL gives a well-defined place for input validation and setting
>  debugger breakpoints even when you don't know which implementation
>  will be used.
> 
> 
> [1] The ABI changes because code to index into the vtable is inlined at
> the call site.  Adding to your example
> 
>  void foo(VirtualType ) {
>obj.Method1();
>// any other line to suppress tail call
>  }
> 
> produces assembly like
> 
>  movrax,QWORD PTR [rdi] ; load vtable for obj
>  call   QWORD PTR [rax+0x10]; 0x10 is offset into vtable
> 
> A different method will use a different offset.  If you add a method,
> offsets of existing methods may change.  With PIMPL, the equivalent
> indexing code resides in your library instead of client code, and yields
> a static (or PLT) call resolved by the linker:
> 
>  call_ZN9PIMPLType7Method1Ev@PLT



Re: [C++] Private implementations and virtual interfaces

2019-07-27 Thread Jed Brown
Wes McKinney  writes:

> The abstract/all-virtual base has some benefits:
>
> * No need to implement "forwarding" methods to the private implementation
> * Do not have to declare "friend" classes in the header for some cases
> where other classes need to access the methods of a private
> implementation
> * Implementation symbols do not need to be exported in DLLs with an
> *_EXPORT macro
>
> There are some drawbacks, or cases where this method cannot be applied, 
> though:
>
> * An implementation of some other abstract interface which needs to
> appear in a public header may not be able to use this approach.
> * My understanding is that the PIMPL pattern will perform better for
> non-virtual functions that are called a lot. It'd be helpful to know
> the magnitude of the performance difference
> * Complex inheritance patterns may require use of virtual inheritance,
> which can create a burden for downstream users (e.g. they may have to
> use dynamic_cast to convert between types in the class hierarchy)

I would add these two points, which may or may not be a significant
concern to you:

* When you add new methods to the abstract virtual model, you change the
  ABI [1] and therefore need to recompile client code.  This has many
  consequences for distribution.

* PIMPL gives a well-defined place for input validation and setting
  debugger breakpoints even when you don't know which implementation
  will be used.


[1] The ABI changes because code to index into the vtable is inlined at
the call site.  Adding to your example

  void foo(VirtualType ) {
obj.Method1();
// any other line to suppress tail call
  }

produces assembly like

  movrax,QWORD PTR [rdi] ; load vtable for obj
  call   QWORD PTR [rax+0x10]; 0x10 is offset into vtable

A different method will use a different offset.  If you add a method,
offsets of existing methods may change.  With PIMPL, the equivalent
indexing code resides in your library instead of client code, and yields
a static (or PLT) call resolved by the linker:

  call_ZN9PIMPLType7Method1Ev@PLT