On 18/11/2011 14:38, Alexandru Juncu wrote:
On Fri, Nov 18, 2011 at 1:24 PM, David Brown<da...@westcontrol.com> wrote:
On 18/11/2011 10:27, Alexandru Juncu wrote:
Hello!
I have a curiosity with something I once tested. I took a simple C
program and made an assembly file with gcc -S.
The C file looks something like this:
int main(void)
{
int a=1, b=2;
return 0;
}
The assembly instructions look like this:
subl $16, %esp
movl $1, -4(%ebp)
movl $2, -8(%ebp)
The subl $16, means the allocation of local variables on the stack,
right? 16 bytes are enough for 4 32bit integers.
If I have 1,2,3 or 4 local variables declared, you get those 16 bytes.
If I have 5 variables, we have " subl $32, %esp". 5,6,7,8
variables ar
the same. 9, 10,11,12, 48 bytes.
The observation is that gcc allocates increments of 4 variables (if
they are integers). If I allocate 8bit chars, increments of 16 chars.
So the allocation is in increments of 16 bytes no matter what.
OK, that's the observation... my question is why? What's the reason
for this, is it an optimization (does is matter what's the -O used?)
or is it architecture dependent (I ran it on x86) and is this just in
gcc, just in a certain version of gcc or this is universal?
Thank you!
This is the wrong mailing list for questions like this - this is the list
for development of gcc itself, rather than for using it.
Thank you for still answering. I apologize, but I looked at the lists
and this one seemed the most generic. Can you redirect me to another
list where this thread would be appropriate?
That would be gcc-h...@gcc.gnu.org.
There is a certain amount of overlap between the lists, and many (most?)
of the gcc developers follow both of them.
However, in answer to your question, the compiler will try to keep the stack
aligned in units of a suitable size for the processor architecture in use.
Typically, the processor will be most efficient if the stack is aligned
with cache lines. I don't know the details of the x86, but presumably
(level 1) cache lines are 16 bytes wide - or at least, that number fits
things like internal bus widths, prefetch buffers, etc. Thus the compiler
makes the tradeoff of using slightly more memory to improve the speed of the
program.
I tried to compile with --param l1-cache-size... nothing seemed to change.
That would only affect the cache size in total, not the cache line width
- "--param l1-cache-line-size" is the one you are looking for. And
there are other reasons why a 16-byte stack alignment might be
considered optimal for the x86, so even if you change
"l1-cache-line-size" it might not change the stack alignment.