Hello Michael,

> > I have a write-up ready but wanted to wait until we have a public
> implementation.
> > Is that the right order?  Or would you rather want to review proposals right
> away.
> 
> I'm not sure what a SIMD lane is.  There are a number of architectures
> which support SIMD, such as the AVX extension in x86.  We try to
> describe functionality in generic terms as much as possible so that it
> can be used with a variety of architectures.
> 
> We'd be happy to see your proposal.  Often an implementation is a good
> proof of concept, but getting feedback on a design early in the process
> can be a guide to that implementation and avoids changes later.

I tried submitting the proposal via the public comment function but I'm not sure
whether I succeeded.  When I clicked on "Submit Comment" nothing happened.
I have not filled out the Section and Page fields as the proposal covers 
multiple
sections on multiple pages.  I did not get any error.  I'm attaching the 
proposal.

The proposal covers SIMD in general.  I have patches for GDB with a test case
using AVX.


> > We'd also want an unbounded piece operator to describe partially 
> > registerized
> > unbounded arrays, but I have not worked that out in detail, yet, and we're 
> > a bit
> > farther away from an implementation.
> 
> Can you describe this more?

Consider a large array kept in memory and a for loop iterating over the array.  
If that
loop gets vectorized, compilers would load a portion of the array into 
registers at the
beginning of the loop body, operate on the registers, and write them back at 
the end
of the loop body.

The entire array can be split into three pieces:
- elements that have already been processed: in memory
- elements that are currently being processed: in registers
- elements that will be processed in future iterations: in memory

For unbounded arrays, the size of the third piece is not known.

Regards,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
Implicitly vectorized code executes multiple instances of a source code loop or
of a source code kernel function simultaneously in a single sequence of
instructions operating on a vector of data elements (cf. SIMD: Single
Instruction Multiple Data).

The size of this vector is typically referred to as SIMD width or SIMD size.
Individual elements and their control flow are typically referred to as SIMD
lanes or SIMD channels.

The user has written the source code from the point of view of a single SIMD
lane.  The code was later vectorized by the compiler to execute multiple SIMD
lanes simultaneously.

Although SIMD code typically works on large vectors or matrices, the compiler is
free to temporarily reorganize the data, e.g. by registerizing some vector
elements, while leaving the rest of the vector in memory or by gathering a
particular structure field of a vector of structures in a register.

Further, the assignment of loop iterations or work items to SIMD lanes may be
done dynamically.

We thus cannot infer the relative location of data objects in SIMD code.

To be able to describe this, we propose the following DWARF extension to
describe the location of a variable as function of the SIMD lane.

===

Section 2.2, pg. 17-22.

Add the following entry to Table 2.2:

  ----------------    ---------------------------
  DW_AT_simd_width    SIMD width of subroutine or
                      lexical block
  ----------------    ---------------------------


Section 3.3.5, pg. 79-80.

Add

    A subprogram or inlined subroutine may have a `DW_AT_simd_width` attribute
    whose value is the SIMD width of the code it contains.  A value of zero
    means that the subroutine does not contain SIMD code.

    If the attribute is not present, the SIMD width is inherited from the parent
    DIE.

    The SIMD width may be overwritten for nested subroutines or for lexical
    blocks contained within that subroutine.


Section 3.5, pg. 92.

Add

    A lexical block that contains SIMD code may have a `DW_AT_simd_width`
    attribute whose value is the SIMD width of the code it contains.  A value of
    zero means that the lexical block does not contain SIMD code.  This can be
    used to mark non-SIMD blocks inside a SIMD subroutine.

    If the attribute is not present, the SIMD width is inherited from the parent
    DIE.

    The SIMD width may be overwritten for lexical blocks nested within that
    block.


Section 2.5.1.3, pg. 29-33.

Add

    16. DW_OP_push_simd_lane

        The DW_OP_push_simd_lane operation pushes the SIMD lane for which the
        expression shall be evaluated.

        The operation is only valid in the context of a lexical block for which
        the SIMD width is known (see DW_AT_simd_width).

        The SIMD lane must be between zero and the lexical block's SIMD width.


Section 2.6.1.2, pg. 42.

Add

    3. DW_OP_piece_stack

       The DW_OP_piece_stack operation works similar to DW_OP_piece except that
       it takes its argument from the DWARF stack.


    4. DW_OP_bit_piece_stack

       The DW_OP_bit_piece_stack operation works similar to DW_OP_bit_piece
       except that it takes its arguments from the DWARF stack.

       The first argument, the size in bits of the piece, is taken from the top
       of the stack.  The second argument, the offset in bits from the location
       defined by the preceding DWARF location description, is taken from the
       second stack location.

       Both arguments are popped from the DWARF stack.
_______________________________________________
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

Reply via email to