On Fri, Jan 03, 2014 at 03:35:11PM +0100, Uros Bizjak wrote:
> On Fri, Jan 3, 2014 at 3:13 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> > On Fri, Jan 03, 2014 at 03:02:51PM +0100, Uros Bizjak wrote:
> >> Please note that previous value was based on earlier (pre P4)
> >> recommendation and it was appropriate for older chips with 32byte
> >> cache line. The value should be updated long ago, when 64bit cache
> >> lines were introduced, but was probably missed due to usage of magic
> >> value without comment.
> >>
> >> Ah, I see. My patch deals only with structures, larger than cache
> >> line. As recommended in "As long as 16-byte boundaries (and cache
> >> lines) are never crossed, natural alignment is not strictly necessary
> >> (though it is an easy way to enforce this)." part of the manual, we
> >> should align smaller structures to 16 or 32 bytes.
> >>
> >> Yes, I agree. Can you please merge your patch together with the proposed 
> >> patch?
> >
> > How do we want to treat the 33-63 resp. 17-31 bytes long aggregates though?
> > 32 byte long and 16 byte long aggregates can surely be aligned just to 32
> > resp. 16 bytes and never crosses 64 byte boundary then and doesn't waste
> > space in paddings unnecessarily (still opt thing, ABI can override),
> > but do we want to waste some extra bytes to ensure that 17-31 resp. 33-63
> > bytes long objects don't cross 64 byte boundaries by aligning those to 32
> > resp. 64 bytes, or do align them to 16 resp. 32 bytes instead?
> 
> Looking at "significant performance penalties" part of the above
> recommendation, I'd say to align it to 32/64 byte boundaries.
> Hopefully, the linker is able to put other data in the hole?

Unless -fdata-sections linker doesn't affect this, unless it is about the
very first or very last object in the TU in the particular section.
GCC itself would need to (supposedly unless -fno-toplevel-reorder) attempt
to sort the varpool constants that are going to be emitted prior to emitting
them (compute what section each decl would be emitted to, and within each
section start with putting variable with biggest alignment first and then
try to pack them nicely).  Kind of similar to what is done for
-fsection-anchors, just don't emit everything as a single block, just sort
them in the varpool queue.

        Jakub

Reply via email to