http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640
--- Comment #3 from Oleg Endo <oleg.e...@t-online.de> 2011-12-31 17:24:47 UTC --- Created attachment 26208 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26208 Proposed patch (In reply to comment #0) > The sh4 port aligns blocks that have no fallthrus and that are either > frequently executed (JUMP_ALIGN) or preceeded a barrier > (LABEL_ALIGN_AFTER_BARRIER) on a cache line. > > While in theory this help to avoid cache misses if the block slits over 2 > cache > lines, in practise this reduces cache locality and lenghten distance between > blocks. > The number of issued instructions are also impacted. For example the relative > indirect address in jump tables needs a byte zero extend instruction if the > distance occupies 8 bits instead of 7 bits. > > I ran some experiments and benchmarked (eembc) with 2 strategies > 1) -falign-jumps=1 > 2) Align the block if the size is bigger than a given threshold. (empirically > set to 16 bytes, half of the cache line size). See illustrating attached > patch. > > My conclusion is that in -O3 the performance never degrades (option 2 is a > little bit better, even improving dhrystone by 3%) when removing this padding. > And the text size improves by ~15%. Because of this I would like to propose the following alignment strategies (unless they are changed by the user with -falign-??? options). -Os: Align everything to 2 byte to get compact code -O2,-O3: Align functions to 4 bytes. Align labels and jumps to 2 bytes (to avoid potential code bloat). Align loops to 4 bytes. The attached patch should do that, although not fully tested yet.