Despite the support for SVE intrinsics in LLVM11 it looks like neither __ARM_FEATURE_SVE_BITS nor __attribute__((arm_sve_vector_bits(...))) are implemented yet.
David > On 24. Jun 2020, at 12:51, Miguel Tairum-Cruz <[email protected]> > wrote: > > SVE ACLE (intrinsics) are supported on LLVM/Clang 11 onwards. > > -Miguel > From: Rasmus Munk Larsen <[email protected]> > Sent: Tuesday, June 23, 2020 11:49 PM > To: eigen <[email protected]> > Cc: Miguel Tairum-Cruz <[email protected]> > Subject: Re: [eigen] Eigen Arm SVE backend RFC > > Yes, clang in particular is important. Are SVE intrinsics supported? > > Rasmus > > On Tue, Jun 23, 2020 at 10:32 AM David Tellenbach <[email protected] > <mailto:[email protected]>> wrote: > Hi Rasmus, > >> The naming should be OK, but could a fixed-length version of this be made to >> work with older compilers? Eigen is deployed on a large number of platforms, >> and depending on GCC 10 would mean missing out on support on many of them. I >> would be wrong, but I suspect that for Eigen the main benefit is not so much >> the variable length aspect, but rather having _some_ long vector extension >> on newer Arm CPUs. > > > Old compilers do not support SVE intrinsics anyway so they won't be able to > compile the proposed backend anyway. I agree that we should try to find a > solution that works for all compilers with SVE support. > > Cheers, > David > >> On 23. Jun 2020, at 19:22, Rasmus Munk Larsen <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Tue, Jun 23, 2020 at 9:09 AM Miguel Tairum-Cruz >> <[email protected] <mailto:[email protected]>> wrote: >> Hi Rasmus, >> >> Thank you for your feedback. >> >> Could we make the vector length a build config macro without a lot of code >> duplication for different lengths? >> GCC 10 support for fixed SVE sizes could be used in this situation, by >> checking the SVE size in the SVE PacketMath code (e.g. #if >> __ARM_FEATURE_SVE_BITS == 512 …). >> However, the Packet names would be less descriptive, e.g.: 'PacketSVE' for >> any vector length instead of 'Packet16' for 512b vectors or 'Packet4' for >> 128b vectors. This should not be an issue, as far as I can tell, as the >> packets would still have the correct size. >> >> The naming should be OK, but could a fixed-length version of this be made to >> work with older compilers? Eigen is deployed on a large number of platforms, >> and depending on GCC 10 would mean missing out on support on many of them. I >> would be wrong, but I suspect that for Eigen the main benefit is not so much >> the variable length aspect, but rather having _some_ long vector extension >> on newer Arm CPUs. >> >> >> We will work on a merge request with these changes in mind. Any >> implementation suggestions or recommendations on this are welcome. >> >> Best regards, >> Miguel >> >> From: Rasmus Munk Larsen <[email protected] <mailto:[email protected]>> >> Sent: Monday, June 22, 2020 11:20 PM >> To: eigen <[email protected] <mailto:[email protected]>>; >> Miguel Tairum-Cruz <[email protected] >> <mailto:[email protected]>> >> Subject: Re: [eigen] Eigen Arm SVE backend RFC >> >> +Miguel directly. >> >> On Mon, Jun 22, 2020 at 3:15 PM Rasmus Munk Larsen <[email protected] >> <mailto:[email protected]>> wrote: >> Miguel, >> >> Thank you very much for the RFC. I think that support for Arm SVE would be a >> useful addition to Eigen. As you mention, doing it with fixed-sized vectors >> will probably be necessary to match the existing Eigen architecture. Could >> we make the vector length a build config macro without a lot of code >> duplication for different lengths? >> >> Could I ask your team to submit this as a merge request against head on the >> main branch for easier review and testing? >> >> Best regards, >> Rasmus >> >> On Wed, Jun 17, 2020 at 2:48 AM Miguel Tairum-Cruz >> <[email protected] <mailto:[email protected]>> wrote: >> Hi all, >> >> I would like to present to the Eigen community a Request for Comments (RFC) >> for a new proof-of-concept vector backend based on the Arm Scalable Vector >> Length (SVE) architecture. >> >> With Eigen being widely used across multiple projects such as TensorFlow, we >> believe that adding support to this new vector length (VL) agnostic >> architecture will benefit performance on upcoming Arm micro-architectures >> and systems. >> >> This proof-of-concept SVE backend keeps in line with the existent vector >> backends, using the Arm C Language Extensions (ACLE) for SVE to optimize >> Eigen’s functions. >> Using the NEON backend as a starting point, we have ported most of NEON >> functions to SVE. Please be aware that this work is built upon a version of >> Eigen from December 2019 / January 2020. All the upstream commits made to >> the NEON backend since then are not yet considered in this version. >> >> The introduced changes are provided in the form of patch files, specifically >> for two SVE vector lengths: 128-bit and 512-bit. You can find more >> information on how to apply them in the provided README file. >> >> One caveat of this initial version is the requirement for fixed SVE vector >> lengths. Eigen codebase and vector optimizations are not fully compatible >> with the vector-length agnostic data types that SVE introduces, which is a >> barrier for its full support upstream. Optimizing the SVE backend for >> specific VLs (in this case 128-bit and 512-bit) is a necessary workaround >> for this initial proof-of-concept. >> >> An additional goal of this work is to integrate the Eigen SVE backend with >> TensorFlow. So far, due to the caveats stated above, we have not been able >> to integrate TensorFlow with Eigen SVE. However, the recent release of GCC >> 10.1 brings a new feature to enable fixed vector sizes at compile time, >> which we believe will allow building TensorFlow with the proof-of-concept >> fixed-VL SVE implementation of Eigen. >> >> Below is the formal RFC document, where we detail the design choices and >> discuss drawbacks and potential solutions to enable a complete >> implementation of an SVE backend for Eigen. >> >> >> Regards, >> >> Miguel >> >> >> >> -------- >> >> >> >> Eigen Arm SVE backend RFC >> >> - Authors: Miguel Tairum ([email protected] >> <mailto:[email protected]>) >> - Updated: 2020-05-15 >> Summary >> >> The purpose of this RFC is to share an experimental proof-of-concept Arm >> Scalable Vector Extension (SVE) backend to Eigen and engage with the Eigen >> development community on feedback and ideas on how to properly implement >> scalable vectors into the Eigen library codebase. >> >> More information on how to apply the RFC patch can be found in the README >> file. >> >> Motivation >> >> SVE >> <https://developer.arm.com/docs/101726/latest/explore-the-scalable-vector-extension-sve/what-is-the-scalable-vector-extension> >> is the next-generation SIMD architectural extension to the Armv8 >> architecture, introducing scalable vector length, per-lane predication, >> gather-loads, scatter-stores amongst other features. >> >> Eigen is a mature linear algebra library, supporting many vector >> architectures, including Arm NEON. Used in multiple projects, including >> TensorFlow, we believe that supporting SVE could not only improve >> compatibility with future micro-architectures, but also enable better >> performance. >> >> Guide-level explanation >> >> In this initial assessment, we present a proof-of-concept SVE port of the >> PacketMath backend in Eigen, using the Arm C Language Extensions (ACLE). >> Like the existent vector backends, SVE intrinsics are implemented in Eigen's >> PacketMath, MathFunctions and TypeCasting source files. In this initial >> release, complex math is not available (due to time constraints). >> >> This proof-of-concept release provides a "fixed-sized" SVE backend, with >> vector lengths of 128 and 512 bits. This means that the implemented >> functions are validated only when executed on those specific SVE lengths, as >> optimizations were only made for them. To facilitate this, we provide a >> patch file for each VL. All currently implemented NEON functions except for >> the Complex math (Complex.h) are included in the SVE backend. This is up to >> date with commit 312c8e77 >> <https://gitlab.com/libeigen/eigen/-/commit/312c8e77ff653d718cf4b318c9633d4b45bb725f> >> from December 2019, plus the changes introduced to the NEON backend up >> until commit da5a7afe >> <https://gitlab.com/libeigen/eigen/-/commit/da5a7afed056596b089a4241b62a7e17f2c43119> >> from 10 January 2020 (these are included in the patches files). This commit >> was chosen to be compatible with TensorFlow 1.x, which uses a similar >> version of Eigen, plus any NEON updates at the time of this work. This >> initial release also contains an updated PacketMath test, with SVE >> validation. >> >> Reference-level explanation >> >> >> >> The changes presented in this RFC are based from commit 312c8e77 >> <https://gitlab.com/libeigen/eigen/-/commit/312c8e77ff653d718cf4b318c9633d4b45bb725f> >> in the master branch. >> >> The Eigen SVE backend can be found at Eigen/src/Core/arch/SVE. >> SVE intrinsics are implemented for float, int and double sized elements. >> Similar to the NEON backend at this time, half packets are not implemented. >> Therefore, the available packet sizes for 512-bit VL are: 16 elements for >> int/float, 8 elements for double; and for 128-bit VL are: 4 elements for >> int/float, 2 elements for double. >> >> For most functions, SVE intrinsics are analogous to the ones used in the >> NEON backend. More complex functions have comments that explain the logic >> behind their implementation. >> >> Regarding the ptranspose function, the PacketBlock structure was duplicated >> and modified into PacketBlockSVE, a new structure of SVE vector pointers. >> This structure is in Eigen/src/Core/GenericPacketMath.h. This is required to >> support vector length agnostic data types, introduced in SVE. Since these >> data types do not have a fixed sized at compile time, they cannot be >> addressed inside vectors and thus pointers are needed. >> The included SVE PacketMath tests (available in /test/packetmath.cc >> <http://packetmath.cc/> and /test/packetmath_sve_resnet.c) make use of this >> new structure to validate the transpose function. >> >> Outside of PacketMath and the previously mentioned locations, other small >> SVE modifications were done whenever a NEON implementation was present in >> the code. Additionally, the cmake files were also modified to accommodate >> the new backend. >> >> Drawbacks and future possibilities >> >> The initial release demonstrates a proof of concept for an SVE backend with >> 128 and 512-bit vector lengths. Although it can be compiled for SVE >> architectures with different vector lengths, some functions will not >> validate, as they were tuned for these specific VLs. >> >> One of main features of SVE, Vector Length Agnosticism (VLA), is not fully >> supported by Eigen, which relies on fixed-vector sizes to better exploit >> vector performance. SVE vectors have sizeless types, identified by the size >> of their elements, independently of the maximum vector length set. As such, >> some structures in Eigen's backend are not compatible with these types, like >> PacketBlock, a structure containing an array of Packets. This structure is >> then called in other parts of the projects (e.g. transpose function), that >> require a workaround to support these data types. >> >> Work still needs to be done to either abstract the vector length in function >> optimization, or to consider all possible SVE vector lengths and to optimize >> accordingly. In order to fully integrate a vector length agnostic SVE >> backend with Eigen, changes to Eigen's core are also required. The >> aforementioned PacketBlock is one of them, but the code needs to be revised >> in order to seamlessly support sizeless vectors without breaking support to >> all existent fixed-sized vector architectures. Ultimately, this would ensure >> compatibility with other projects such as TensorFlow, which currently cannot >> be built with Eigen SVE. As it stands in the proof-of-concept, benchmarks >> need to be carefully written to use the SVE backend. >> >> As of mid-May, GCC 10.1 stable build has been released, bringing the feature >> to create fixed-length SVE types. This enables the substitution of sizeless >> data types for fixed size ones, solving the above incompatibility with the >> PacketBlock structure. However, this is not a complete solution, as it does >> not bring support for the desired SVE VLA. >> We are currently performing some tests and evaluating this GCC feature with >> a TensorFlow build. The goal is to be able to build Tensorflow and run some >> benchmark using the proof-of-concept Eigen with the SVE backend and a fixed >> VL. >> >> >> IMPORTANT NOTICE: The contents of this email and any attachments are >> confidential and may also be privileged. If you are not the intended >> recipient, please notify the sender immediately and do not disclose the >> contents to any other person, use it for any purpose, or store or copy the >> information in any medium. Thank you. >> IMPORTANT NOTICE: The contents of this email and any attachments are >> confidential and may also be privileged. If you are not the intended >> recipient, please notify the sender immediately and do not disclose the >> contents to any other person, use it for any purpose, or store or copy the >> information in any medium. Thank you. > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you.
