On Thu, Jun 10, 2010 at 11:05 AM, Quentin Neill <quentin.neill....@gmail.com> wrote: > Cross-posting Reza's call for feedback to the binutils list since it > is relevant - > see the last few paragraphs regarding how to "solve the alignment problem". > > Original thread: http://gcc.gnu.org/ml/gcc/2010-06/threads.html#00402 > > Not sure if followups should occur on one list or both. > -- > Quentin Neill > > > On Thu, Jun 10, 2010 at 12:20 PM, reza yazdani <yazdani_r...@yahoo.com> wrote: >> Hi, >> >> We are in the process of adding a feature to GCC to take advantage >> of a new hardware feature in the latest AMD micro processor. This >> feature requires a certain mix, ordering and alignments in >> instruction sequences to obtain the expected hardware performance. >> >> I am asking the community to review this high level implementation >> design and give me direction or advice. >> >> The new hardware issues two windows of the size N bytes of >> instructions in every cycle. It goes into accelerate mode if the >> windows have the right combination of instructions or alignments. Our >> goal is to maximize the IPC by proper instruction scheduling and >> alignments. >> >> Here is a summary of the most important requirements: >> >> a) Maximum of N instructions per window. >> b) An instruction may cross the first window. >> c) Each window can have maximum of x memory loads and y memory >> stores . >> d) The total number of immediate constants in the instructions >> of a window should not exceed k. >> e) The first window must be aligned on 16 byte boundary. >> f) A Window set terminates when a branch exists in a window. >> g) The number of allowed prefixes varies for instructions. >> h) A window set needs to be padded by prefixes in instructions >> or terminated by nops to ensure adherence to the rules. >> >> We have the following implementation plan for GCC: >> >> 1) Modify the Haifa scheduler to make the desired arrangement of >> instructions for the two dispatch windows. The scheduler is called >> once before and once after register allocation as usual. In both >> cases it performs dispatch scheduling along with its normal job of >> instruction scheduling. >> >> The advantage of doing it before register allocation is avoiding >> extra dependencies caused by register allocation which may become >> an obstacle to movement of instructions. The advantage of doing >> it after register allocation is a consideration for spilling code >> which may be generated by the register allocator. >> >> The algorithm we use is: >> >> a) Considering the current dispatch window set, choose the first >> instruction from ready queue that does not violate dispatch rules. >> b) When an instruction is selected and scheduled, inform the >> dispatcher code about the instruction. This step keeps track of the >> instruction content of windows for future evaluation. It also manages >> the window set by closing and opening new virtual dispatch windows. >> >> 2) Insertion of alignment code. >> >> In x86 alignment is done by inserting prefixes or by generating >> nops. As the object code is generated by the assembler in GCC, some >> information such as sizes of branches are unknown until assembly or >> link time. To do alignments related to dispatch correctly in GCC, >> we need to iteratively compute prefixes and branch sizes until >> its convergence. This pass currently does not exist in GCC, but it >> exists in the assembler. >> >> There are two possible approaches to solve alignment problem. >> >> a) Let the assembler performs the alignments and padding needed >> to adhere with the new machine dispatching rules and avoid an extra >> pass in GCC. >> b) Add a new pass to mimic what assembler does before generating >> the assembly listing in GCC and insert the required alignments. >> >> I appreciate your comments on the proposed implementation procedure >> and the choices a or b above.
I don't this should be done in assembler. Assembler should just assemble the assembly input. -- H.J.