Re: [C++] Private implementations and virtual interfaces
Building CI that detects ABI breakage is not hard. There is a closed PR in the arrow repo from me that does exactly this using abi-complicance-checker. I understand that we will not be able to provide ABI stability on all Arrow subprojects but having it for a core would be really great. This would allow easy upgrading of adaptors that connect to Arrow but don't actually work with the resulting structures (this can be database connectors and file format implementations). Uwe On Sun, Jul 28, 2019, at 10:43 AM, Antoine Pitrou wrote: > > Le 28/07/2019 à 01:49, Wes McKinney a écrit : > > On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn wrote: > >> > >> The PIMPL is a thing I would trade a bit of performance as it brings ABI > >> stability. This is something that will help us making Arrow usage in > >> thirdparty code much simpler. > >> > > > > I question whether ABI stability (at the level of the shared library > > ABI version) will ever be practical in this project. In the case of > > NumPy, there are very few C data structures that could actually > > change. In this library we have a great deal more data structures, I > > would guess 100s of distinct objects taking into account each C++ > > class in the library. It's one thing to be API forward compatible from > > major version to major version (with warnings to allow for graceful > > deprecations) but such forward compatibility strategies are not so > > readily available when talking about the composition of C++ classes > > (considering that virtual function tables are part of the ABI). > > > > In any case, as a result of this, I'm not comfortable basing technical > > decisions on the basis of ABI stability considerations. > > We could at some point define a "stable ABI subset" (for example the > core classes Array, ArrayData, etc.). But I'm not sure whether that > would be sufficient for users. > > Also in any case it would need someone to maintain the regular toolkit > and continuous integration hooks to prevent ABI breakage. Otherwise > it's pointless promising something that we can't efficiently monitor. > > Regards > > Antoine. >
Re: [C++] Private implementations and virtual interfaces
Le 28/07/2019 à 01:49, Wes McKinney a écrit : > On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn wrote: >> >> The PIMPL is a thing I would trade a bit of performance as it brings ABI >> stability. This is something that will help us making Arrow usage in >> thirdparty code much simpler. >> > > I question whether ABI stability (at the level of the shared library > ABI version) will ever be practical in this project. In the case of > NumPy, there are very few C data structures that could actually > change. In this library we have a great deal more data structures, I > would guess 100s of distinct objects taking into account each C++ > class in the library. It's one thing to be API forward compatible from > major version to major version (with warnings to allow for graceful > deprecations) but such forward compatibility strategies are not so > readily available when talking about the composition of C++ classes > (considering that virtual function tables are part of the ABI). > > In any case, as a result of this, I'm not comfortable basing technical > decisions on the basis of ABI stability considerations. We could at some point define a "stable ABI subset" (for example the core classes Array, ArrayData, etc.). But I'm not sure whether that would be sufficient for users. Also in any case it would need someone to maintain the regular toolkit and continuous integration hooks to prevent ABI breakage. Otherwise it's pointless promising something that we can't efficiently monitor. Regards Antoine.
Re: [C++] Private implementations and virtual interfaces
Hi, Le 27/07/2019 à 21:04, Wes McKinney a écrit : > * My understanding is that the PIMPL pattern will perform better for > non-virtual functions that are called a lot. It'd be helpful to know > the magnitude of the performance difference Modern CPUs have indirect branch predictors. If an indirect branch (e.g. virtual function call) always resolves to the same target address, it should have very good performance. My main issue with the Pimpl pattern is the amount of boilerplate it requires. One workaround is to use a "half-pimpl", i.e. keep public API implementations in the main class, but put private helpers in the pimpl. See e.g. MockFileSystem vs. MockFileSystem::Impl here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/mockfs.cc#L217 That's not always convenient though. Regards Antoine. > * Complex inheritance patterns may require use of virtual inheritance, > which can create a burden for downstream users (e.g. they may have to > use dynamic_cast to convert between types in the class hierarchy) > > I'm far from a C++ expert so I'm always learning as time goes on, but > hopefully other C++ developers have thoughts about this and we can > keep an eye out in code reviews for cases where a simpler > implementation may suffice > > - Wes >
Re: [C++] Private implementations and virtual interfaces
On Sat, Jul 27, 2019 at 4:38 PM Uwe L. Korn wrote: > > The PIMPL is a thing I would trade a bit of performance as it brings ABI > stability. This is something that will help us making Arrow usage in > thirdparty code much simpler. > I question whether ABI stability (at the level of the shared library ABI version) will ever be practical in this project. In the case of NumPy, there are very few C data structures that could actually change. In this library we have a great deal more data structures, I would guess 100s of distinct objects taking into account each C++ class in the library. It's one thing to be API forward compatible from major version to major version (with warnings to allow for graceful deprecations) but such forward compatibility strategies are not so readily available when talking about the composition of C++ classes (considering that virtual function tables are part of the ABI). In any case, as a result of this, I'm not comfortable basing technical decisions on the basis of ABI stability considerations. > Simple updates when an API was only extended but the ABI is intact is a great > ease on the Arrow consumer side. I know that this is a bit more hassle on the > developer side but it's something I really love with NumPy. It's so much > simpler to do a version upgrade than with an ABI breaking library such as > Arrow. > > Uwe > > > Am 27.07.2019 um 22:57 schrieb Jed Brown : > > > > Wes McKinney writes: > > > >> The abstract/all-virtual base has some benefits: > >> > >> * No need to implement "forwarding" methods to the private implementation > >> * Do not have to declare "friend" classes in the header for some cases > >> where other classes need to access the methods of a private > >> implementation > >> * Implementation symbols do not need to be exported in DLLs with an > >> *_EXPORT macro > >> > >> There are some drawbacks, or cases where this method cannot be applied, > >> though: > >> > >> * An implementation of some other abstract interface which needs to > >> appear in a public header may not be able to use this approach. > >> * My understanding is that the PIMPL pattern will perform better for > >> non-virtual functions that are called a lot. It'd be helpful to know > >> the magnitude of the performance difference > >> * Complex inheritance patterns may require use of virtual inheritance, > >> which can create a burden for downstream users (e.g. they may have to > >> use dynamic_cast to convert between types in the class hierarchy) > > > > I would add these two points, which may or may not be a significant > > concern to you: > > > > * When you add new methods to the abstract virtual model, you change the > > ABI [1] and therefore need to recompile client code. This has many > > consequences for distribution. > > > > * PIMPL gives a well-defined place for input validation and setting > > debugger breakpoints even when you don't know which implementation > > will be used. > > > > > > [1] The ABI changes because code to index into the vtable is inlined at > > the call site. Adding to your example > > > > void foo(VirtualType ) { > >obj.Method1(); > >// any other line to suppress tail call > > } > > > > produces assembly like > > > > movrax,QWORD PTR [rdi] ; load vtable for obj > > call QWORD PTR [rax+0x10]; 0x10 is offset into vtable > > > > A different method will use a different offset. If you add a method, > > offsets of existing methods may change. With PIMPL, the equivalent > > indexing code resides in your library instead of client code, and yields > > a static (or PLT) call resolved by the linker: > > > > call_ZN9PIMPLType7Method1Ev@PLT >
Re: [C++] Private implementations and virtual interfaces
The PIMPL is a thing I would trade a bit of performance as it brings ABI stability. This is something that will help us making Arrow usage in thirdparty code much simpler. Simple updates when an API was only extended but the ABI is intact is a great ease on the Arrow consumer side. I know that this is a bit more hassle on the developer side but it's something I really love with NumPy. It's so much simpler to do a version upgrade than with an ABI breaking library such as Arrow. Uwe > Am 27.07.2019 um 22:57 schrieb Jed Brown : > > Wes McKinney writes: > >> The abstract/all-virtual base has some benefits: >> >> * No need to implement "forwarding" methods to the private implementation >> * Do not have to declare "friend" classes in the header for some cases >> where other classes need to access the methods of a private >> implementation >> * Implementation symbols do not need to be exported in DLLs with an >> *_EXPORT macro >> >> There are some drawbacks, or cases where this method cannot be applied, >> though: >> >> * An implementation of some other abstract interface which needs to >> appear in a public header may not be able to use this approach. >> * My understanding is that the PIMPL pattern will perform better for >> non-virtual functions that are called a lot. It'd be helpful to know >> the magnitude of the performance difference >> * Complex inheritance patterns may require use of virtual inheritance, >> which can create a burden for downstream users (e.g. they may have to >> use dynamic_cast to convert between types in the class hierarchy) > > I would add these two points, which may or may not be a significant > concern to you: > > * When you add new methods to the abstract virtual model, you change the > ABI [1] and therefore need to recompile client code. This has many > consequences for distribution. > > * PIMPL gives a well-defined place for input validation and setting > debugger breakpoints even when you don't know which implementation > will be used. > > > [1] The ABI changes because code to index into the vtable is inlined at > the call site. Adding to your example > > void foo(VirtualType ) { >obj.Method1(); >// any other line to suppress tail call > } > > produces assembly like > > movrax,QWORD PTR [rdi] ; load vtable for obj > call QWORD PTR [rax+0x10]; 0x10 is offset into vtable > > A different method will use a different offset. If you add a method, > offsets of existing methods may change. With PIMPL, the equivalent > indexing code resides in your library instead of client code, and yields > a static (or PLT) call resolved by the linker: > > call_ZN9PIMPLType7Method1Ev@PLT
Re: [C++] Private implementations and virtual interfaces
Wes McKinney writes: > The abstract/all-virtual base has some benefits: > > * No need to implement "forwarding" methods to the private implementation > * Do not have to declare "friend" classes in the header for some cases > where other classes need to access the methods of a private > implementation > * Implementation symbols do not need to be exported in DLLs with an > *_EXPORT macro > > There are some drawbacks, or cases where this method cannot be applied, > though: > > * An implementation of some other abstract interface which needs to > appear in a public header may not be able to use this approach. > * My understanding is that the PIMPL pattern will perform better for > non-virtual functions that are called a lot. It'd be helpful to know > the magnitude of the performance difference > * Complex inheritance patterns may require use of virtual inheritance, > which can create a burden for downstream users (e.g. they may have to > use dynamic_cast to convert between types in the class hierarchy) I would add these two points, which may or may not be a significant concern to you: * When you add new methods to the abstract virtual model, you change the ABI [1] and therefore need to recompile client code. This has many consequences for distribution. * PIMPL gives a well-defined place for input validation and setting debugger breakpoints even when you don't know which implementation will be used. [1] The ABI changes because code to index into the vtable is inlined at the call site. Adding to your example void foo(VirtualType ) { obj.Method1(); // any other line to suppress tail call } produces assembly like movrax,QWORD PTR [rdi] ; load vtable for obj call QWORD PTR [rax+0x10]; 0x10 is offset into vtable A different method will use a different offset. If you add a method, offsets of existing methods may change. With PIMPL, the equivalent indexing code resides in your library instead of client code, and yields a static (or PLT) call resolved by the linker: call_ZN9PIMPLType7Method1Ev@PLT