Hello Michael, > > I have a write-up ready but wanted to wait until we have a public > implementation. > > Is that the right order? Or would you rather want to review proposals right > away. > > I'm not sure what a SIMD lane is. There are a number of architectures > which support SIMD, such as the AVX extension in x86. We try to > describe functionality in generic terms as much as possible so that it > can be used with a variety of architectures. > > We'd be happy to see your proposal. Often an implementation is a good > proof of concept, but getting feedback on a design early in the process > can be a guide to that implementation and avoids changes later.
I tried submitting the proposal via the public comment function but I'm not sure whether I succeeded. When I clicked on "Submit Comment" nothing happened. I have not filled out the Section and Page fields as the proposal covers multiple sections on multiple pages. I did not get any error. I'm attaching the proposal. The proposal covers SIMD in general. I have patches for GDB with a test case using AVX. > > We'd also want an unbounded piece operator to describe partially > > registerized > > unbounded arrays, but I have not worked that out in detail, yet, and we're > > a bit > > farther away from an implementation. > > Can you describe this more? Consider a large array kept in memory and a for loop iterating over the array. If that loop gets vectorized, compilers would load a portion of the array into registers at the beginning of the loop body, operate on the registers, and write them back at the end of the loop body. The entire array can be split into three pieces: - elements that have already been processed: in memory - elements that are currently being processed: in registers - elements that will be processed in future iterations: in memory For unbounded arrays, the size of the third piece is not known. Regards, Markus. Intel Deutschland GmbH Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de Managing Directors: Christin Eisenschmid, Gary Kershaw Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928
Implicitly vectorized code executes multiple instances of a source code loop or of a source code kernel function simultaneously in a single sequence of instructions operating on a vector of data elements (cf. SIMD: Single Instruction Multiple Data). The size of this vector is typically referred to as SIMD width or SIMD size. Individual elements and their control flow are typically referred to as SIMD lanes or SIMD channels. The user has written the source code from the point of view of a single SIMD lane. The code was later vectorized by the compiler to execute multiple SIMD lanes simultaneously. Although SIMD code typically works on large vectors or matrices, the compiler is free to temporarily reorganize the data, e.g. by registerizing some vector elements, while leaving the rest of the vector in memory or by gathering a particular structure field of a vector of structures in a register. Further, the assignment of loop iterations or work items to SIMD lanes may be done dynamically. We thus cannot infer the relative location of data objects in SIMD code. To be able to describe this, we propose the following DWARF extension to describe the location of a variable as function of the SIMD lane. === Section 2.2, pg. 17-22. Add the following entry to Table 2.2: ---------------- --------------------------- DW_AT_simd_width SIMD width of subroutine or lexical block ---------------- --------------------------- Section 3.3.5, pg. 79-80. Add A subprogram or inlined subroutine may have a `DW_AT_simd_width` attribute whose value is the SIMD width of the code it contains. A value of zero means that the subroutine does not contain SIMD code. If the attribute is not present, the SIMD width is inherited from the parent DIE. The SIMD width may be overwritten for nested subroutines or for lexical blocks contained within that subroutine. Section 3.5, pg. 92. Add A lexical block that contains SIMD code may have a `DW_AT_simd_width` attribute whose value is the SIMD width of the code it contains. A value of zero means that the lexical block does not contain SIMD code. This can be used to mark non-SIMD blocks inside a SIMD subroutine. If the attribute is not present, the SIMD width is inherited from the parent DIE. The SIMD width may be overwritten for lexical blocks nested within that block. Section 2.5.1.3, pg. 29-33. Add 16. DW_OP_push_simd_lane The DW_OP_push_simd_lane operation pushes the SIMD lane for which the expression shall be evaluated. The operation is only valid in the context of a lexical block for which the SIMD width is known (see DW_AT_simd_width). The SIMD lane must be between zero and the lexical block's SIMD width. Section 2.6.1.2, pg. 42. Add 3. DW_OP_piece_stack The DW_OP_piece_stack operation works similar to DW_OP_piece except that it takes its argument from the DWARF stack. 4. DW_OP_bit_piece_stack The DW_OP_bit_piece_stack operation works similar to DW_OP_bit_piece except that it takes its arguments from the DWARF stack. The first argument, the size in bits of the piece, is taken from the top of the stack. The second argument, the offset in bits from the location defined by the preceding DWARF location description, is taken from the second stack location. Both arguments are popped from the DWARF stack.
_______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org