Hello All: Since the prefetch instruction have no direct consumers in the code stream, they provide considerable freedom to the Instruction scheduler. They are typically assigned lower priorities than most of the instructions in the code stream. This tends to cause all the prefetch instructions to be placed together in the final schedule. This causes the performance Degradations by placing them in clumps rather than evenly spreading the prefetch instructions.
The evenly spreading the prefetch instruction gives better speed up ratios as compared to be placing in clumps for dirty Misses. I am curious to know how the schedulers in the GCC handles the prefetch instruction and how the priorities is assigned to Prefetch instructions in the gcc schedulers so that the prefetch instruction is evenly spread. Please let me know what do you think. Thanks & Regards Ajit