Despite the support for SVE intrinsics in LLVM11 it looks like neither 
__ARM_FEATURE_SVE_BITS nor __attribute__((arm_sve_vector_bits(...))) are 
implemented yet. 

David

> On 24. Jun 2020, at 12:51, Miguel Tairum-Cruz <[email protected]> 
> wrote:
> 
> SVE ACLE (intrinsics) are supported on LLVM/Clang 11 onwards.
> 
> -Miguel
> From: Rasmus Munk Larsen <[email protected]>
> Sent: Tuesday, June 23, 2020 11:49 PM
> To: eigen <[email protected]>
> Cc: Miguel Tairum-Cruz <[email protected]>
> Subject: Re: [eigen] Eigen Arm SVE backend RFC
>  
> Yes, clang in particular is important. Are SVE intrinsics supported?
> 
> Rasmus
> 
> On Tue, Jun 23, 2020 at 10:32 AM David Tellenbach <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Rasmus,
> 
>> The naming should be OK, but could a fixed-length version of this be made to 
>> work with older compilers? Eigen is deployed on a large number of platforms, 
>> and depending on GCC 10 would mean missing out on support on many of them. I 
>> would be wrong, but I suspect that for Eigen the main benefit is not so much 
>> the variable length aspect, but rather having _some_ long vector extension 
>> on newer Arm CPUs.
> 
> 
> Old compilers do not support SVE intrinsics anyway so they won't be able to 
> compile the proposed backend anyway. I agree that we should try to find a 
> solution that works for all compilers with SVE support.
> 
> Cheers,
> David
> 
>> On 23. Jun 2020, at 19:22, Rasmus Munk Larsen <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>> On Tue, Jun 23, 2020 at 9:09 AM Miguel Tairum-Cruz 
>> <[email protected] <mailto:[email protected]>> wrote:
>> Hi Rasmus,
>>  
>> Thank you for your feedback.
>>  
>>  Could we make the vector length a build config macro without a lot of code 
>> duplication for different lengths? 
>> GCC 10 support for fixed SVE sizes could be used in this situation, by 
>> checking the SVE size in the SVE PacketMath code (e.g. #if 
>> __ARM_FEATURE_SVE_BITS == 512 …).
>> However, the Packet names would be less descriptive, e.g.: 'PacketSVE' for 
>> any vector length instead of 'Packet16' for 512b vectors or 'Packet4' for 
>> 128b vectors. This should not be an issue, as far as I can tell, as the 
>> packets would still have the correct size.
>> 
>> The naming should be OK, but could a fixed-length version of this be made to 
>> work with older compilers? Eigen is deployed on a large number of platforms, 
>> and depending on GCC 10 would mean missing out on support on many of them. I 
>> would be wrong, but I suspect that for Eigen the main benefit is not so much 
>> the variable length aspect, but rather having _some_ long vector extension 
>> on newer Arm CPUs.
>>  
>>  
>> We will work on a merge request with these changes in mind. Any 
>> implementation suggestions or recommendations on this are welcome.
>>  
>> Best regards,
>> Miguel 
>> 
>> From: Rasmus Munk Larsen <[email protected] <mailto:[email protected]>>
>> Sent: Monday, June 22, 2020 11:20 PM
>> To: eigen <[email protected] <mailto:[email protected]>>; 
>> Miguel Tairum-Cruz <[email protected] 
>> <mailto:[email protected]>>
>> Subject: Re: [eigen] Eigen Arm SVE backend RFC
>>  
>> +Miguel directly.
>> 
>> On Mon, Jun 22, 2020 at 3:15 PM Rasmus Munk Larsen <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Miguel,
>> 
>> Thank you very much for the RFC. I think that support for Arm SVE would be a 
>> useful addition to Eigen. As you mention, doing it with fixed-sized vectors 
>> will probably be necessary to match the existing Eigen architecture. Could 
>> we make the vector length a build config macro without a lot of code 
>> duplication for different lengths?
>> 
>>  Could I ask your team to submit this as a merge request against head on the 
>> main branch for easier review and testing?
>> 
>> Best regards,
>>    Rasmus
>> 
>> On Wed, Jun 17, 2020 at 2:48 AM Miguel Tairum-Cruz 
>> <[email protected] <mailto:[email protected]>> wrote:
>> Hi all,
>>  
>> I would like to present to the Eigen community a Request for Comments (RFC) 
>> for a new proof-of-concept vector backend based on the Arm Scalable Vector 
>> Length (SVE) architecture.
>> 
>> With Eigen being widely used across multiple projects such as TensorFlow, we 
>> believe that adding support to this new vector length (VL) agnostic 
>> architecture will benefit performance on upcoming Arm micro-architectures 
>> and systems.
>> 
>> This proof-of-concept SVE backend keeps in line with the existent vector 
>> backends, using the Arm C Language Extensions (ACLE) for SVE to optimize 
>> Eigen’s functions.
>> Using the NEON backend as a starting point, we have ported most of NEON 
>> functions to SVE. Please be aware that this work is built upon a version of 
>> Eigen from December 2019 / January 2020. All the upstream commits made to 
>> the NEON backend since then are not yet considered in this version.
>> 
>> The introduced changes are provided in the form of patch files, specifically 
>> for two SVE vector lengths: 128-bit and 512-bit. You can find more 
>> information on how to apply them in the provided README file.
>> 
>> One caveat of this initial version is the requirement for fixed SVE vector 
>> lengths. Eigen codebase and vector optimizations are not fully compatible 
>> with the vector-length agnostic data types that SVE introduces, which is a 
>> barrier for its full support upstream. Optimizing the SVE backend for 
>> specific VLs (in this case 128-bit and 512-bit) is a necessary workaround 
>> for this initial proof-of-concept.
>> 
>> An additional goal of this work is to integrate the Eigen SVE backend with 
>> TensorFlow. So far, due to the caveats stated above, we have not been able 
>> to integrate TensorFlow with Eigen SVE. However, the recent release of GCC 
>> 10.1 brings a new feature to enable fixed vector sizes at compile time, 
>> which we believe will allow building TensorFlow with the proof-of-concept 
>> fixed-VL SVE implementation of Eigen.
>> 
>> Below is the formal RFC document, where we detail the design choices and 
>> discuss drawbacks and potential solutions to enable a complete 
>> implementation of an SVE backend for Eigen.
>> 
>>  
>> Regards,
>> 
>> Miguel
>> 
>> 
>> 
>> --------
>> 
>> 
>> 
>> Eigen Arm SVE backend RFC
>> 
>> - Authors: Miguel Tairum ([email protected] 
>> <mailto:[email protected]>)
>> - Updated: 2020-05-15
>> Summary
>> 
>> The purpose of this RFC is to share an experimental proof-of-concept Arm 
>> Scalable Vector Extension (SVE) backend to Eigen and engage with the Eigen 
>> development community on feedback and ideas on how to properly implement 
>> scalable vectors into the Eigen library codebase.
>> 
>> More information on how to apply the RFC patch can be found in the README 
>> file.
>> 
>> Motivation
>> 
>> SVE 
>> <https://developer.arm.com/docs/101726/latest/explore-the-scalable-vector-extension-sve/what-is-the-scalable-vector-extension>
>>  is the next-generation SIMD architectural extension to the Armv8 
>> architecture, introducing scalable vector length, per-lane predication, 
>> gather-loads, scatter-stores amongst other features.
>> 
>> Eigen is a mature linear algebra library, supporting many vector 
>> architectures, including Arm NEON. Used in multiple projects, including 
>> TensorFlow, we believe that supporting SVE could not only improve 
>> compatibility with future micro-architectures, but also enable better 
>> performance.
>> 
>> Guide-level explanation
>> 
>> In this initial assessment, we present a proof-of-concept SVE port of the 
>> PacketMath backend in Eigen, using the Arm C Language Extensions (ACLE). 
>> Like the existent vector backends, SVE intrinsics are implemented in Eigen's 
>> PacketMath, MathFunctions and TypeCasting source files. In this initial 
>> release, complex math is not available (due to time constraints).
>> 
>> This proof-of-concept release provides a "fixed-sized" SVE backend, with 
>> vector lengths of 128 and 512 bits. This means that the implemented 
>> functions are validated only when executed on those specific SVE lengths, as 
>> optimizations were only made for them. To facilitate this, we provide a 
>> patch file for each VL. All currently implemented NEON functions except for 
>> the Complex math (Complex.h) are included in the SVE backend. This is up to 
>> date with commit 312c8e77 
>> <https://gitlab.com/libeigen/eigen/-/commit/312c8e77ff653d718cf4b318c9633d4b45bb725f>
>>  from December 2019, plus the changes introduced to the NEON backend up 
>> until commit da5a7afe 
>> <https://gitlab.com/libeigen/eigen/-/commit/da5a7afed056596b089a4241b62a7e17f2c43119>
>>  from 10 January 2020 (these are included in the patches files). This commit 
>> was chosen to be compatible with TensorFlow 1.x, which uses a similar 
>> version of Eigen, plus any NEON updates at the time of this work. This 
>> initial release also contains an updated PacketMath test, with SVE 
>> validation.
>> 
>> Reference-level explanation
>> 
>>  
>> 
>> The changes presented in this RFC are based from commit 312c8e77 
>> <https://gitlab.com/libeigen/eigen/-/commit/312c8e77ff653d718cf4b318c9633d4b45bb725f>
>>  in the master branch.
>> 
>> The Eigen SVE backend can be found at Eigen/src/Core/arch/SVE. 
>> SVE intrinsics are implemented for float, int and double sized elements. 
>> Similar to the NEON backend at this time, half packets are not implemented. 
>> Therefore, the available packet sizes for 512-bit VL are: 16 elements for 
>> int/float, 8 elements for double; and for 128-bit VL are: 4 elements for 
>> int/float, 2 elements for double.
>> 
>> For most functions, SVE intrinsics are analogous to the ones used in the 
>> NEON backend. More complex functions have comments that explain the logic 
>> behind their implementation.
>> 
>> Regarding the ptranspose function, the PacketBlock structure was duplicated 
>> and modified into PacketBlockSVE, a new structure of SVE vector pointers. 
>> This structure is in Eigen/src/Core/GenericPacketMath.h. This is required to 
>> support vector length agnostic data types, introduced in SVE. Since these 
>> data types do not have a fixed sized at compile time, they cannot be 
>> addressed inside vectors and thus pointers are needed.
>> The included SVE PacketMath tests (available in /test/packetmath.cc 
>> <http://packetmath.cc/> and /test/packetmath_sve_resnet.c) make use of this 
>> new structure to validate the transpose function.
>> 
>> Outside of PacketMath and the previously mentioned locations, other small 
>> SVE modifications were done whenever a NEON implementation was present in 
>> the code. Additionally, the cmake files were also modified to accommodate 
>> the new backend.
>> 
>> Drawbacks and future possibilities
>> 
>> The initial release demonstrates a proof of concept for an SVE backend with 
>> 128 and 512-bit vector lengths. Although it can be compiled for SVE 
>> architectures with different vector lengths, some functions will not 
>> validate, as they were tuned for these specific VLs.
>> 
>> One of main features of SVE, Vector Length Agnosticism (VLA), is not fully 
>> supported by Eigen, which relies on fixed-vector sizes to better exploit 
>> vector performance. SVE vectors have sizeless types, identified by the size 
>> of their elements, independently of the maximum vector length set. As such, 
>> some structures in Eigen's backend are not compatible with these types, like 
>> PacketBlock, a structure containing an array of Packets. This structure is 
>> then called in other parts of the projects (e.g. transpose function), that 
>> require a workaround to support these data types.
>> 
>> Work still needs to be done to either abstract the vector length in function 
>> optimization, or to consider all possible SVE vector lengths and to optimize 
>> accordingly. In order to fully integrate a vector length agnostic SVE 
>> backend with Eigen, changes to Eigen's core are also required. The 
>> aforementioned PacketBlock is one of them, but the code needs to be revised 
>> in order to seamlessly support sizeless vectors without breaking support to 
>> all existent fixed-sized vector architectures. Ultimately, this would ensure 
>> compatibility with other projects such as TensorFlow, which currently cannot 
>> be built with Eigen SVE. As it stands in the proof-of-concept, benchmarks 
>> need to be carefully written to use the SVE backend.
>> 
>> As of mid-May, GCC 10.1 stable build has been released, bringing the feature 
>> to create fixed-length SVE types. This enables the substitution of sizeless 
>> data types for fixed size ones, solving the above incompatibility with the 
>> PacketBlock structure. However, this is not a complete solution, as it does 
>> not bring support for the desired SVE VLA. 
>> We are currently performing some tests and evaluating this GCC feature with 
>> a TensorFlow build. The goal is to be able to build Tensorflow and run some 
>> benchmark using the proof-of-concept Eigen with the SVE backend and a fixed 
>> VL.
>> 
>> 
>> IMPORTANT NOTICE: The contents of this email and any attachments are 
>> confidential and may also be privileged. If you are not the intended 
>> recipient, please notify the sender immediately and do not disclose the 
>> contents to any other person, use it for any purpose, or store or copy the 
>> information in any medium. Thank you. 
>> IMPORTANT NOTICE: The contents of this email and any attachments are 
>> confidential and may also be privileged. If you are not the intended 
>> recipient, please notify the sender immediately and do not disclose the 
>> contents to any other person, use it for any purpose, or store or copy the 
>> information in any medium. Thank you. 
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.

Reply via email to