Miguel, Thank you very much for the RFC. Please allow some time for the community to digest this information.
Rasmus On Wed, Jun 17, 2020 at 2:48 AM Miguel Tairum-Cruz < [email protected]> wrote: > Hi all, > > > > I would like to present to the Eigen community a Request for Comments > (RFC) for a new proof-of-concept vector backend based on the Arm Scalable > Vector Length (SVE) architecture. > > With Eigen being widely used across multiple projects such as TensorFlow, > we believe that adding support to this new vector length (VL) agnostic > architecture will benefit performance on upcoming Arm micro-architectures > and systems. > > This proof-of-concept SVE backend keeps in line with the existent vector > backends, using the Arm C Language Extensions (ACLE) for SVE to optimize > Eigen’s functions. > Using the NEON backend as a starting point, we have ported most of NEON > functions to SVE. Please be aware that this work is built upon a version of > Eigen from December 2019 / January 2020. All the upstream commits made to > the NEON backend since then are not yet considered in this version. > > The introduced changes are provided in the form of patch files, > specifically for two SVE vector lengths: 128-bit and 512-bit. You can find > more information on how to apply them in the provided README file. > > One caveat of this initial version is the requirement for fixed SVE vector > lengths. Eigen codebase and vector optimizations are not fully compatible > with the vector-length agnostic data types that SVE introduces, which is a > barrier for its full support upstream. Optimizing the SVE backend for > specific VLs (in this case 128-bit and 512-bit) is a necessary workaround > for this initial proof-of-concept. > > An additional goal of this work is to integrate the Eigen SVE backend with > TensorFlow. So far, due to the caveats stated above, we have not been able > to integrate TensorFlow with Eigen SVE. However, the recent release of GCC > 10.1 brings a new feature to enable fixed vector sizes at compile time, > which we believe will allow building TensorFlow with the proof-of-concept > fixed-VL SVE implementation of Eigen. > > Below is the formal RFC document, where we detail the design choices and > discuss drawbacks and potential solutions to enable a complete > implementation of an SVE backend for Eigen. > > > > Regards, > > Miguel > > > -------- > > > *Eigen Arm SVE backend RFC* > > - Authors: Miguel Tairum ([email protected]) > - Updated: 2020-05-15 > > *Summary* > > The purpose of this RFC is to share an experimental proof-of-concept Arm > Scalable Vector Extension (SVE) backend to Eigen and engage with the Eigen > development community on feedback and ideas on how to properly implement > scalable vectors into the Eigen library codebase. > > More information on how to apply the RFC patch can be found in the README > file. > > *Motivation* > > SVE > <https://developer.arm.com/docs/101726/latest/explore-the-scalable-vector-extension-sve/what-is-the-scalable-vector-extension> > is > the next-generation SIMD architectural extension to the Armv8 architecture, > introducing scalable vector length, per-lane predication, gather-loads, > scatter-stores amongst other features. > > Eigen is a mature linear algebra library, supporting many vector > architectures, including Arm NEON. Used in multiple projects, including > TensorFlow, we believe that supporting SVE could not only improve > compatibility with future micro-architectures, but also enable better > performance. > > *Guide-level explanation* > > In this initial assessment, we present a proof-of-concept SVE port of the > *PacketMath* backend in Eigen, using the Arm C Language Extensions > (ACLE). Like the existent vector backends, SVE intrinsics are implemented > in Eigen's *PacketMath*, *MathFunctions* and *TypeCasting* source files. > In this initial release, complex math is not available (due to time > constraints). > > This proof-of-concept release provides a "fixed-sized" SVE backend, with > vector lengths of 128 and 512 bits. This means that the implemented > functions are validated only when executed on those specific SVE lengths, > as optimizations were only made for them. To facilitate this, we provide a > patch file for each VL. All currently implemented NEON functions except for > the Complex math (Complex.h) are included in the SVE backend. This is up to > date with commit 312c8e77 > <https://gitlab.com/libeigen/eigen/-/commit/312c8e77ff653d718cf4b318c9633d4b45bb725f> > from December 2019, plus the changes introduced to the NEON backend up > until commit da5a7afe > <https://gitlab.com/libeigen/eigen/-/commit/da5a7afed056596b089a4241b62a7e17f2c43119> > from 10 January 2020 (these are included in the patches files). This > commit was chosen to be compatible with TensorFlow 1.x, which uses a > similar version of Eigen, plus any NEON updates at the time of this work. > This initial release also contains an updated *PacketMath* test, with SVE > validation. > > *Reference-level explanation* > > > > The changes presented in this RFC are based from commit 312c8e77 > <https://gitlab.com/libeigen/eigen/-/commit/312c8e77ff653d718cf4b318c9633d4b45bb725f> > in > the master branch. > > The Eigen SVE backend can be found at *Eigen/src/Core/arch/SVE*. > SVE intrinsics are implemented for float, int and double sized elements. > Similar to the NEON backend at this time, half packets are not implemented. > Therefore, the available packet sizes for 512-bit VL are: 16 elements for > int/float, 8 elements for double; and for 128-bit VL are: 4 elements for > int/float, 2 elements for double. > > For most functions, SVE intrinsics are analogous to the ones used in the > NEON backend. More complex functions have comments that explain the logic > behind their implementation. > > Regarding the *ptranspose *function, the *PacketBlock* structure was > duplicated and modified into *PacketBlockSVE*, a new structure of SVE > vector pointers. This structure is in *Eigen/src/Core/GenericPacketMath.h*. > This is required to support vector length agnostic data types, introduced > in SVE. Since these data types do not have a fixed sized at compile time, > they cannot be addressed inside vectors and thus pointers are needed. > The included SVE PacketMath tests (available in /test/packetmath.cc and > /test/packetmath_sve_resnet.c) make use of this new structure to validate > the transpose function. > > Outside of *PacketMath *and the previously mentioned locations, other > small SVE modifications were done whenever a NEON implementation was > present in the code. Additionally, the cmake files were also modified to > accommodate the new backend. > > *Drawbacks and future possibilities* > > The initial release demonstrates a proof of concept for an SVE backend > with 128 and 512-bit vector lengths. Although it can be compiled for SVE > architectures with different vector lengths, some functions will not > validate, as they were tuned for these specific VLs. > > One of main features of SVE, Vector Length Agnosticism (VLA), is not fully > supported by Eigen, which relies on fixed-vector sizes to better exploit > vector performance. SVE vectors have sizeless types, identified by the size > of their elements, independently of the maximum vector length set. As such, > some structures in Eigen's backend are not compatible with these types, > like *PacketBlock*, a structure containing an array of *Packets*. This > structure is then called in other parts of the projects (e.g. transpose > function), that require a workaround to support these data types. > > Work still needs to be done to either abstract the vector length in > function optimization, or to consider all possible SVE vector lengths and > to optimize accordingly. In order to fully integrate a vector length > agnostic SVE backend with Eigen, changes to Eigen's core are also required. > The aforementioned *PacketBlock* is one of them, but the code needs to be > revised in order to seamlessly support sizeless vectors without breaking > support to all existent fixed-sized vector architectures. Ultimately, this > would ensure compatibility with other projects such as TensorFlow, which > currently cannot be built with Eigen SVE. As it stands in the > proof-of-concept, benchmarks need to be carefully written to use the SVE > backend. > > As of mid-May, GCC 10.1 stable build has been released, bringing the > feature to create fixed-length SVE types. This enables the substitution of > sizeless data types for fixed size ones, solving the above incompatibility > with the PacketBlock structure. However, this is not a complete solution, > as it does not bring support for the desired SVE VLA. > We are currently performing some tests and evaluating this GCC feature > with a TensorFlow build. The goal is to be able to build Tensorflow and run > some benchmark using the proof-of-concept Eigen with the SVE backend and a > fixed VL. > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. >
