+Miguel directly.

On Mon, Jun 22, 2020 at 3:15 PM Rasmus Munk Larsen <[email protected]>
wrote:

> Miguel,
>
> Thank you very much for the RFC. I think that support for Arm SVE would be
> a useful addition to Eigen. As you mention, doing it with fixed-sized
> vectors will probably be necessary to match the existing Eigen
> architecture. Could we make the vector length a build config macro without
> a lot of code duplication for different lengths?
>
>  Could I ask your team to submit this as a merge request against head on
> the main branch for easier review and testing?
>
> Best regards,
>    Rasmus
>
> On Wed, Jun 17, 2020 at 2:48 AM Miguel Tairum-Cruz <
> [email protected]> wrote:
>
>> Hi all,
>>
>>
>>
>> I would like to present to the Eigen community a Request for Comments
>> (RFC) for a new proof-of-concept vector backend based on the Arm Scalable
>> Vector Length (SVE) architecture.
>>
>> With Eigen being widely used across multiple projects such as TensorFlow,
>> we believe that adding support to this new vector length (VL) agnostic
>> architecture will benefit performance on upcoming Arm micro-architectures
>> and systems.
>>
>> This proof-of-concept SVE backend keeps in line with the existent vector
>> backends, using the Arm C Language Extensions (ACLE) for SVE to optimize
>> Eigen’s functions.
>> Using the NEON backend as a starting point, we have ported most of NEON
>> functions to SVE. Please be aware that this work is built upon a version of
>> Eigen from December 2019 / January 2020. All the upstream commits made to
>> the NEON backend since then are not yet considered in this version.
>>
>> The introduced changes are provided in the form of patch files,
>> specifically for two SVE vector lengths: 128-bit and 512-bit. You can find
>> more information on how to apply them in the provided README file.
>>
>> One caveat of this initial version is the requirement for fixed SVE
>> vector lengths. Eigen codebase and vector optimizations are not fully
>> compatible with the vector-length agnostic data types that SVE introduces,
>> which is a barrier for its full support upstream. Optimizing the SVE
>> backend for specific VLs (in this case 128-bit and 512-bit) is a necessary
>> workaround for this initial proof-of-concept.
>>
>> An additional goal of this work is to integrate the Eigen SVE backend
>> with TensorFlow. So far, due to the caveats stated above, we have not been
>> able to integrate TensorFlow with Eigen SVE. However, the recent release of
>> GCC 10.1 brings a new feature to enable fixed vector sizes at compile time,
>> which we believe will allow building TensorFlow with the proof-of-concept
>> fixed-VL SVE implementation of Eigen.
>>
>> Below is the formal RFC document, where we detail the design choices and
>> discuss drawbacks and potential solutions to enable a complete
>> implementation of an SVE backend for Eigen.
>>
>>
>>
>> Regards,
>>
>> Miguel
>>
>>
>> --------
>>
>>
>> *Eigen Arm SVE backend RFC*
>>
>> - Authors: Miguel Tairum ([email protected])
>> - Updated: 2020-05-15
>>
>> *Summary*
>>
>> The purpose of this RFC is to share an experimental proof-of-concept Arm
>> Scalable Vector Extension (SVE) backend to Eigen and engage with the Eigen
>> development community on feedback and ideas on how to properly implement
>> scalable vectors into the Eigen library codebase.
>>
>> More information on how to apply the RFC patch can be found in the README
>> file.
>>
>> *Motivation*
>>
>> SVE
>> <https://developer.arm.com/docs/101726/latest/explore-the-scalable-vector-extension-sve/what-is-the-scalable-vector-extension>
>>  is
>> the next-generation SIMD architectural extension to the Armv8 architecture,
>> introducing scalable vector length, per-lane predication, gather-loads,
>> scatter-stores amongst other features.
>>
>> Eigen is a mature linear algebra library, supporting many vector
>> architectures, including Arm NEON. Used in multiple projects, including
>> TensorFlow, we believe that supporting SVE could not only improve
>> compatibility with future micro-architectures, but also enable better
>> performance.
>>
>> *Guide-level explanation*
>>
>> In this initial assessment, we present a proof-of-concept SVE port of the
>> *PacketMath* backend in Eigen, using the Arm C Language Extensions
>> (ACLE). Like the existent vector backends, SVE intrinsics are implemented
>> in Eigen's *PacketMath*, *MathFunctions* and *TypeCasting* source files.
>> In this initial release, complex math is not available (due to time
>> constraints).
>>
>> This proof-of-concept release provides a "fixed-sized" SVE backend, with
>> vector lengths of 128 and 512 bits. This means that the implemented
>> functions are validated only when executed on those specific SVE lengths,
>> as optimizations were only made for them. To facilitate this, we provide a
>> patch file for each VL. All currently implemented NEON functions except for
>> the Complex math (Complex.h) are included in the SVE backend. This is up to
>> date with commit 312c8e77
>> <https://gitlab.com/libeigen/eigen/-/commit/312c8e77ff653d718cf4b318c9633d4b45bb725f>
>> from December 2019, plus the changes introduced to the NEON backend up
>> until commit da5a7afe
>> <https://gitlab.com/libeigen/eigen/-/commit/da5a7afed056596b089a4241b62a7e17f2c43119>
>>  from 10 January 2020 (these are included in the patches files). This
>> commit was chosen to be compatible with TensorFlow 1.x, which uses a
>> similar version of Eigen, plus any NEON updates at the time of this work.
>> This initial release also contains an updated *PacketMath* test, with
>> SVE validation.
>>
>> *Reference-level explanation*
>>
>>
>>
>> The changes presented in this RFC are based from commit 312c8e77
>> <https://gitlab.com/libeigen/eigen/-/commit/312c8e77ff653d718cf4b318c9633d4b45bb725f>
>>  in
>> the master branch.
>>
>> The Eigen SVE backend can be found at *Eigen/src/Core/arch/SVE*.
>> SVE intrinsics are implemented for float, int and double sized elements.
>> Similar to the NEON backend at this time, half packets are not implemented.
>> Therefore, the available packet sizes for 512-bit VL are: 16 elements for
>> int/float, 8 elements for double; and for 128-bit VL are: 4 elements for
>> int/float, 2 elements for double.
>>
>> For most functions, SVE intrinsics are analogous to the ones used in the
>> NEON backend. More complex functions have comments that explain the logic
>> behind their implementation.
>>
>> Regarding the *ptranspose *function, the *PacketBlock* structure was
>> duplicated and modified into *PacketBlockSVE*, a new structure of SVE
>> vector pointers. This structure is in
>> *Eigen/src/Core/GenericPacketMath.h*. This is required to support vector
>> length agnostic data types, introduced in SVE. Since these data types do
>> not have a fixed sized at compile time, they cannot be addressed inside
>> vectors and thus pointers are needed.
>> The included SVE PacketMath tests (available in /test/packetmath.cc and
>> /test/packetmath_sve_resnet.c) make use of this new structure to
>> validate the transpose function.
>>
>> Outside of *PacketMath *and the previously mentioned locations, other
>> small SVE modifications were done whenever a NEON implementation was
>> present in the code. Additionally, the cmake files were also modified to
>> accommodate the new backend.
>>
>> *Drawbacks and future possibilities*
>>
>> The initial release demonstrates a proof of concept for an SVE backend
>> with 128 and 512-bit vector lengths. Although it can be compiled for SVE
>> architectures with different vector lengths, some functions will not
>> validate, as they were tuned for these specific VLs.
>>
>> One of main features of SVE, Vector Length Agnosticism (VLA), is not
>> fully supported by Eigen, which relies on fixed-vector sizes to better
>> exploit vector performance. SVE vectors have sizeless types, identified by
>> the size of their elements, independently of the maximum vector length set.
>> As such, some structures in Eigen's backend are not compatible with these
>> types, like *PacketBlock*, a structure containing an array of *Packets*.
>> This structure is then called in other parts of the projects (e.g.
>> transpose function), that require a workaround to support these data types.
>>
>> Work still needs to be done to either abstract the vector length in
>> function optimization, or to consider all possible SVE vector lengths and
>> to optimize accordingly. In order to fully integrate a vector length
>> agnostic SVE backend with Eigen, changes to Eigen's core are also required.
>> The aforementioned *PacketBlock* is one of them, but the code needs to
>> be revised in order to seamlessly support sizeless vectors without breaking
>> support to all existent fixed-sized vector architectures. Ultimately, this
>> would ensure compatibility with other projects such as TensorFlow, which
>> currently cannot be built with Eigen SVE. As it stands in the
>> proof-of-concept, benchmarks need to be carefully written to use the SVE
>> backend.
>>
>> As of mid-May, GCC 10.1 stable build has been released, bringing the
>> feature to create fixed-length SVE types. This enables the substitution of
>> sizeless data types for fixed size ones, solving the above incompatibility
>> with the PacketBlock structure. However, this is not a complete solution,
>> as it does not bring support for the desired SVE VLA.
>> We are currently performing some tests and evaluating this GCC feature
>> with a TensorFlow build. The goal is to be able to build Tensorflow and run
>> some benchmark using the proof-of-concept Eigen with the SVE backend and a
>> fixed VL.
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>

Reply via email to