Const flagged as incompatible argument

2021-10-17 Thread Zoltán Kócsi
Consider the following code segment:

voidfoo( const char * const m[] );

char*bar( void );

voidbaz( void )
{
char*m[ 2 ];

m[ 0 ] = bar();
m[ 1 ] = bar();

foo( m );
}

gcc 8.2.0 (and 7.4.1 as well) with -Wall gives a warning, for Intel or
ARM target:

test.c:12:7: warning: passing argument 1 of ‘foo’ from incompatible
pointer type [-Wincompatible-pointer-types] 
foo( m );
 ^
test.c:1:6: note: expected ‘const char * const*’ but argument is of
type ‘char **’

My understanding of the C standard (and I might be mistaken) is that
with the const-s I promised the compiler that foo() won't modify either
the array or the pointed strings, nothing more.

So why is the compiler complaining just because I passed a mutable
array of mutable strings? 

Also, how is it different from this case:

void foo( const char *p );
char *bar( void );
void baz( void ) { foo( bar() ); }

which is accepted by the compiler without a warning.

The warning also goes away is m[] is defined as const char *[],
but why is the warning issued in the first place? 

Thanks,

Zoltan


Re: An asm constraint issue (ARM FPU)

2021-07-29 Thread Zoltán Kócsi
Dear Marc,

Sorry for the late answer, I was away for a few days.
Yes, that fixes it. THANK YOU!

Do you know which gcc source file contains the magic qualifiers for the
asm arguments? I wouldn't mind to go through the code and extract what
I can. Probably I'd find a couple of gems that are useful for inline
asm stuff. Maybe even write the info pages that describe them, so
that others can make use of them...

Thanks again,

Best Regards,

Zoltan


On Sun, 25 Jul 2021 14:19:56 +0200 (CEST)
Marc Glisse  wrote:

> On Sun, 25 Jul 2021, Zoltán Kócsi wrote:
> 
> > [...]
> > double spoof( uint64_t x )
> > {
> > double r;
> >
> >   asm volatile
> >   (
> > " vmov.64 %[d],%Q[i],%R[i] \n"  
> 
> Isn't it supposed to be %P[d] for a double?
> (the documentation is very lacking...)
> 
> > [...]
> -- 
> Marc Glisse


An asm constraint issue (ARM FPU)

2021-07-24 Thread Zoltán Kócsi
I try to write a one-liner inline function to create a double form
a 64-bit integer, not converting it to a double but the integer
containing the bit pattern for the double (type spoofing).

The compiler is arm-eabi-gcc 8.2.0.
The target is a Cortex-A9, with NEON.

According to the info page the assembler constraint "w" denotes an FPU
double register, d0 - d31.

The code is the following:

double spoof( uint64_t x )
{
double r;

   asm volatile
   (
 " vmov.64 %[d],%Q[i],%R[i] \n"
 : [d] "=w" (r)
 : [i] "q" (x)
   );
  
   return r;
}

The command line:

arm-eabi-gcc -O0 -c -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=neon-vfpv4 \
test.c

It compiles and the generated object code is this:

 :
   0:   e52db004push{fp}; (str fp, [sp, #-4]!)
   4:   e28db000add fp, sp, #0
   8:   e24dd014sub sp, sp, #20
   c:   e14b01f4strdr0, [fp, #-20]  ; 0xffec
  10:   e14b21d4ldrdr2, [fp, #-20]  ; 0xffec
  14:   ec432b30vmovd16, r2, r3
  18:   ed4b0b03vstrd16, [fp, #-12]
  1c:   e14b20dcldrdr2, [fp, #-12]
  20:   ec432b30vmovd16, r2, r3
  24:   eeb00b60vmov.f64d0, d16
  28:   e28bd000add sp, fp, #0
  2c:   e49db004pop {fp}; (ldr fp, [sp], #4)
  30:   e12fff1ebx  lr

which is not really efficient, but works.

However, if I specify -O1, -O2 or -Os then the compilation fails
because assembler complains. This is the assembly the compiler
generated, (comments and irrelevant stuff removed):

spoof:
   vmov.64 s0,r0,r1
   bx lr

where the problem is that 's0' is a single-precision float register and
it should be 'd0' instead.

Either I'm seriously missing something, in which case I would be most
obliged if someone sent me to the right direction; or it is a compiler
or documentation bug.

Thanks,

Zoltan


libgcc maintainer

2012-02-05 Thread Zoltán Kócsi
Who'd be the best person to contact regarding to libgcc for ARM 4T, 6M and 7M
targets?

Thanks,

Zoltan


Re: Assignment to volatile objects

2012-01-30 Thread Zoltán Kócsi
On Mon, 30 Jan 2012 19:51:47 -0600
Gabriel Dos Reis  wrote:

> On Mon, Jan 30, 2012 at 4:59 PM, Zoltán Kócsi  wrote:
> > David Brown  wrote:
> >
> >> Until gcc gets a feature allowing it to whack the programmer on the back
> >> of the head with Knuth's "The Art of Computer Programming" for writing
> >> such stupid code that relies on the behaviour of volatile "a = b = 0;",
> >> then a warning seems like a good idea.
> >
> > a = b = 0; might be stupid.
> >
> > Is if ( ( a = expr ) ); is also stupid?
> 
> If you ask me, yes.

Beauty is in the eye of the beholder, I like

while (( *dst++ = *src++ ));

better than

_some_type_ tmp;

do {
  tmp = *src;
  *dst = tmp;
  src = src + 1;
  dst = dst + 1;
} while ( tmp != 0 );


Zoltan


Re: Assignment to volatile objects

2012-01-30 Thread Zoltán Kócsi
On Tue, 31 Jan 2012 00:38:15 +0100
Georg-Johann Lay  wrote:

> A warning would be much of a help to write unambiguous, robust code.
> So the question is rather why user refuses to write robust code in the 
> first place once there is a warning.

The user (me, in this case) does not refuse writing robust code, because he
has no other choice, warning or not. The user is kindly asking to accomodate
his wish to write concise and more elegant code which is not five times
longer just to get around a language ambiguity (i.e. robust).

> IMHO it's about weigh cluttering up compiler souces with zillions of 
> command line options like
> 
> - how to resolve a = b = c; if b is volatile.
> - how to resolve i = i++;
> - how to resolve f(i++, i++);
> - etc.
> 
> against benefit for the user. I don't really see benefit.

I think there is a rather important difference between the volatile case 
and the others. Accessing a volatile is a side effect and so is the
postincrementing of the object. In the increment case you do know that the
side effects will happen before the next sequence point, you just do not know
exactly when within the two enclosing sequence points. With the a=b=0 case the
side effect of reading b may or may not happen at all. That, I think, is a
major difference.

In fact, I think there is an even bigger ambiguity with the volatile. Consider
the case of the single statement of a = 0; where a is volatile.

a=0; is an expression statement. Such a statement is evaluated as a void
expression for its side effects, as per 6.8.3.2. A void expression is an
expression of which the value is discarded, as per 6.3.2.2. Thus, the value of
a=0 should be calculated and then discarded. Since evaluating the value of
that expression when 'a' is volatile may or may not read 'a' back, as per
6.5.16.3, the compiler thus has every right to randomly generate or not a read
after writing the 0 to a. That is, a simple assignment has an unpredictable
side effect on volatile objects. Nothing in the standard says that you must
not actually calculate the value of an expression statement before
discarding it, actually it explicitely states in 5.1.2.3.4 that you must not
omit parts of an expression evaluation which have side effects even if
the expression's value is not used. The read-back of a volatile lhs of an
expression is a side effect, which, according to the standard, the compiler
can emit or omit at whim.

And with that writing 'robust' code becomes impossible, as long as it
matters to you whether the a=0; statement will read back the volatile 'a' 
or not.

Zoltan


Re: Assignment to volatile objects

2012-01-30 Thread Zoltán Kócsi
paul_kon...@dell.com> wrote:

> I would prefer this to generate a warning.  The C language standard change
> you refer to is a classic example of a misguided change, and any code whose
> behavior depends on this deserves a warning message, NOT an option to work
> one way or the other.

Sure. However, a compiler is a tool and the best thing it can do is to serve
its user's needs. Generate a warning, because there's an ambiguous construct
(actually, it has always been a bit iffy, but now it is officially an
implementation choice).

When there is a possibility of helping the user, what's wrong with offering
it? If there's a switch and it is being used, then the user explicitely tells
you how (s)he wants the ambiguity to be resolved. The user, by specifying his
or her preference clearly indicates that (s)he is aware of the ambiguity,
i.e. knows what (s)he is doing and asked you kindly to resolve it this way or
the other. You can answer with a "piss off, idiot" or just do what the user
asked you to do. So why not help the user?

Zoltan


Re: Assignment to volatile objects

2012-01-30 Thread Zoltán Kócsi
David Brown  wrote:

> Until gcc gets a feature allowing it to whack the programmer on the back 
> of the head with Knuth's "The Art of Computer Programming" for writing 
> such stupid code that relies on the behaviour of volatile "a = b = 0;", 
> then a warning seems like a good idea.

a = b = 0; might be stupid. 

Is if ( ( a = expr ) ); is also stupid? 

I thought that that idiom was cited as an example for the expressiveness of C
in the C bible (the K&R book).

Zoltan


Assignment to volatile objects

2012-01-30 Thread Zoltán Kócsi
Now that the new C standard is out, is there any chance that gcc's behaviour
regarding to volatile lhs in an assignment changes?

This is what it does today:

volatile int a, b;

  a = b = 0;

translates to

  b = 0;
  a = b;

because the standard (up to and including C99) stated that the value of the
assignment operator is the value of the lhs after the assignment.

The C11 standard says the same but then it explicitely states that the
compiler does not have to read back the value of the lhs, not even when the
lhs is volatile.

So it is actually legal now not to read back the lhs. Is there a chance for
the compiler to get a switch which would tell it explicitely not to read the
value back?

Zoltan


Re: Float point issue

2011-10-27 Thread Zoltán Kócsi
On Thu, 27 Oct 2011 23:31:14 -0400
Robert Dewar  wrote:

> > - I am missing a gcc flag
> 
> probably you should avoid extra precision and all the
> issues it brings, as well as speed up your program, by
> using SSE 64-bit arithmetic (using the appropriate gcc
> flags)

Indeed. -mpc64 fixes the issue and proper 53-bit rounding is applied.
Thanks a lot.

Zoltan


Float point issue

2011-10-27 Thread Zoltán Kócsi
I found something very strange, although it might be just a misunderstanding.

As far as I know, the IEEE-754 standard defines round-to-nearest, tie-to-even
as follows:

- For rounding purposes, the operation must be performed as if it were done
  with infinite precision
- Then, if the bit right of the LSB of the result (guard) is 0, do nothing.
- Otherwise, if *any* bit right of the guard bit is 1, add 1 to the result
- Otherwise, if the LSB of the result is 1, add 1 to the result
- Otherwise leave the result as it is

Assuming that the above is true, and that that is the default rounding
mode, then the following is most surprising. This bit of code:

#include 
#include 

main()
{
double a, b;
int i;

  a = 1.0;

  for ( i = -54 ; i > -106 ; i-- ) {

b = ldexp( 1.0, -53 ) + ldexp( 1.0, i );
printf( "%.13a + %.13a = %.13a\n", a, b, a+b );
  }
}

generates this output on an Intel i5 core:

0x1.0p+0 + 0x1.8p-53 = 0x1.1p+0
0x1.0p+0 + 0x1.4p-53 = 0x1.1p+0
0x1.0p+0 + 0x1.2p-53 = 0x1.1p+0
[...]
0x1.0p+0 + 0x1.01000p-53 = 0x1.1p+0
0x1.0p+0 + 0x1.00800p-53 = 0x1.1p+0
0x1.0p+0 + 0x1.00400p-53 = 0x1.1p+0
0x1.0p+0 + 0x1.00200p-53 = 0x1.0p+0 <== ?
0x1.0p+0 + 0x1.00100p-53 = 0x1.0p+0
0x1.0p+0 + 0x1.00080p-53 = 0x1.0p+0
0x1.0p+0 + 0x1.00040p-53 = 0x1.0p+0
[...]
0x1.0p+0 + 0x1.4p-53 = 0x1.0p+0
0x1.0p+0 + 0x1.2p-53 = 0x1.0p+0
0x1.0p+0 + 0x1.1p-53 = 0x1.0p+0

which seems to indicate that the sticky bit contains only the first 10 bits
right of the guard, all the rest is thrown away silently (as if internally
the operations were done on 64 bits only).

I tried to pass -fexcess-precision=standard to gcc, but got the same result.

I wonder whether

- I know the IEEE rounding rules incorrectly
- I am missing a gcc flag
- gcc is doing something funny when setting up the FPU
- Intel's FPU is not standard compiant

Thanks,

Zoltan


Built-in function question

2011-10-26 Thread Zoltán Kócsi
I came across some very interesting behavior regarding to built-in functions:

int __builtin_popcount( unsigned x ); is a gcc bult-in, which actually
returns the number of 1 bits in x.

   int foo( unsigned x )
   {
  return __builtin_popcount( x );
   }

generates a call to the __popcountsi2 function in libgcc, for any target I
tried it for (well, I tried for x86, ARM and m68k).

However:

   int (*bar( void ))( unsigned )
   {
  return __builtin_popcount;
   }

returns the address of the label "__builtin_popcount", which does not exist:

   int main( int arcg, char *argv[] )
   {
  (void) argv;
  return (*bar())( argc );
   }

fails to compile because of an undefined reference to __builtin_popcount.

The compiler does not give any warning with -Wall -Wextra -pedantic
but it spits the dummy during the linking phase. 

The next quite interesting thing is the effect of optimisation.
With -O1 or above bar() returns the address of the non-existent function
__builtin_popcount() *but* main(), which dereferences bar() is optimised
to simply call __popcountsi2 in the library. So the linking fails because
bar() (which is not actually called by main()) refers to the nonexistent
function, but if bar() is made static, the optimisiation gets rid of it and
everything is fine and the linking succeeds.

A further point is that the compiler generates a .globl for __popcountsi2 but
it does not do that for __builtin_popcount, which is rather unusual (although
not fatal, since gas treats all undefined symbols as globals). Nevertheless,
gcc normally pedanticly emits a .globl for every global symbol it generates
or refers to, but not in this case.

At least the 4.5.x compiler behaves like that. The info page does not say that
one can not take the address of a built-in function (and the compiler does not
issue a warning on it), so a link time failure, which depends on whether the
optimiser could eliminate the need to the actual function pointer or not, is
somewhat surprising.

I understand that there are very special built-in functions, some that work
only at compile time, some show very funky argument handling behaviour and so
on. However, many are (well, seem to be) stock standard functions, realised
either as a call to libgcc or as a few machine instructions, that is,
behaving like inline asm() wrapped in a static inline. 
Those functions, I think, should really behave like ordinary (possibly static
inline asm) functions. Or, if not, at least one should be warned.

I believe that the above is an issue, but I don't know if it is bug or a
feature, i.e. a compiler or a documentation issue?

Thanks,

Zoltan


Register constrained variable issue

2011-10-13 Thread Zoltán Kócsi
If one writes a bit of code like this:

int foo( void )
{
register int x asm( "Rn" );

  asm volatile ( "INSN_A %0 \n\t" : "=r" (x) );
  bar();
  asm volatile ( "INSN_B %0,%0 \n\t" : "=r" (x) : "0" (x) );
  return x;
}

and Rn is a register not saved over function calls, then gcc does not save it
but allows it to get clobbered by the calling of bar(). For example, if the
processor is ARM and Rn is r2 (on ARM r0-r3 and r12 can get clobbered by a
function call), then the following code is generated; if you don't know ARM
assembly, comments tell you what's going on:

foo:
   stmfd   sp!, {r3, lr} // Save the return address
   INSN_A  r2// The insn generating r2's content
   bl  bar   // Call bar(); it may destroy r2
   INSN_B  r2, r2// *** Here a possibly clobbered r2 is used!
   mov r0, r2// Copy r2 to the return value register
   ldmfd   sp!, {r3, lr} // Restore the return address
   bx  lr// Return

Note that you don't need a real function call in your code, it is enough if
you do something which forces gcc to call a function in libgcc.a. On some ARM
targets a long long shift or an integer division or even just a switch {}
statement is enough to trigger a call to the support library.

Which basically means that one *must not* allocate a register not saved by
function calls because then they can get clobbered at any time.

It is not an ARM specific issue, either; other targets behave the same. The
compiler version is 4.5.3. 

The info page regarding to specifying registers for variables does not say
that the register one chooses must be a register saved across calls. On the
other hand, it does say that register content might be destroyed when the
compiler knows that the data is not live any more; a statement which has a
vibe suggesting that the register content is preserved as long as the data is
live.

For global register variables the info page does actually warn about library
routines possibly clobbering registers and says that one should use a saved
and restored register. However, global and function local variables are very
different animals; global regs are reserved and not tracked by the data flow
analysis while local var regs are part of the data flow analysis, as stated
by the info page.

So I don't know if it is a bug (i.e. the compiler is supposed to protect local
reg vars) or just misleading/omitted information in the info page?

Thanks,

Zoltan


libgcc question

2011-02-10 Thread Zoltán Kócsi
Am I doing something wrong or there's a problem with libgcc?

I'm compiling code for an ARM based micro. I'm using gcc 4.5.1,
configured for arm-eabi-none, C compiler only. The target is a
standalone embedded device, no OS, nothing, not even a C library, just
bare metal. The compiler (and linker, gcc is being used to start the
linker) get the -ffreestanding -static -static-libgcc -nostdlib flags.

Everything works fine until I want to do a 64-bit division.
Then the linking fails, telling me that I have undefined references
to memcpy, abort, __exidx_start and __exidx_end.

Telling the linker to create an output despite the missing references
reveals that the resulting object file contains the actual 64-bit
division from libgcc, as expected. Plus it also contains about 4KB worth
of functions related to unwinding, which are never referenced anywhere
(i.e. the libgcc division routine does not call or use *any* of the
functions there). There is all sorts of code in there to deal with the
(nonexistent) float-point coprocessor, throwing exceptions and other
magic.

So, a function containing a single call to __aeabi_uldivmod results in
about 4 KB unused code being sucked in from libgcc.a (some of which
could not even be executed by the target processor), with 4 undefined
references, of which __exidx_start and __exidx_end are, as far as I
know, not even standard library functions.

Is this a bug in libgcc, have I massiviley misconfigured the
compilation of gcc itself or am I doing something horribly wrong but
can't see the obvious?

Zoltan


Re: array of pointer to function support in GNU C

2010-09-16 Thread Zoltán Kócsi
On Thu, 16 Sep 2010 00:50:07 -0700
J Decker  wrote:

[...]

> > int main(void)
> > {
> >    void *(*func)(void **);
> >    func;
> strange that this does anything... since it also requires a pointer to
> a pointer...

I think the compiler is right: "func" is a pointer to a function.
Since the () operator (function call) is not used, it simply parses as
an expression without any side effect. Same as

char *x;

 x;

Zoltan


Inline assembly operand specification

2009-10-02 Thread Zoltán Kócsi
Is there a documentation of the various magic letters that you can
apply to an operand in inline assembly? What I mean is this:

asm volatile (
   " some_insn %X[operand] \n"
   : [operand] "=r" (expr)
);

What I look for is documentation of 'X'. In particular, when (expr) is
a multi-register object, such as long long or double (or even a short,
on a 8-bit chip) and you want to select a particular part of it. The
only place I found some information was going through the
gcc/config//.c file and trying to find the meaning of such
letters in the xxx_print_operand() function. If that is the correct
approach, then I think there's a problem with the arm-elf (I know it is
dead, but still).

According to the comments in that function, for DI and DF arguments the
Q and R qualifiers supposed to select the least significant and most
significant 32 bits, respectively, of the 64-bit datum. Indeed that's
what they do, for a long long. However, for a double they don't seem to
take into account that on arm-elf the word order of a double is always
big-endian, regardless of the endianness of the rest. Therefore, they
select the wrong half of the datum. On arm-eabi, where the endianness
of doubles matches the rest, they work fine.

Am I completely off-track?

Zoltan


Bitfields

2009-09-20 Thread Zoltán Kócsi
I wonder if there would be at least a theoretical support by the
developers to a proposal for volatile bitfields:

When a HW register (thus most likely declared as volatile) is defined as
a bitfield, as far as I know gcc treats each bitfield assignment as a
separate read-modify-write operation. Thats is, if I have a 32-bit
register with 3 fields

struct s_hw_reg {

 int field1 : 10,
 field2 : 10,
 field3 : 12;

};

then

reg.field1 = val1;
reg.field2 = val2;

will be turned into a fetch, mask, or with val1, store, fetch, mask, or
with val2, store sequence. I wonder if there could be a special gcc
extension, strictly only when a -f option is explicitely passed to the
compiler, where the comma operator could be used to tell the compiler
to concatenate the operations:

reg.field1 = val1, reg.field2 = val2;

would then turn into fetch, mask with a combined mask of field1 and
field2, or val1, or val2, store.

Since the bit field operations can not be concatenated that way
currently, and quite frequently you want to change multiple fields in a
HW register simultaneously (i.e. with a single write), more often
than not you have to give up the whole bit field notion and define
everything like

#define MASK1 0xffc0
#define MASK2 0x003ff000
#define MASK3 0x0fff

and so on, then you explicitely write the code that fetches, masks
with a compined mas, or-s with a combined field value set and stores. A
lot of typing could be avoided with the bitfields, not to mention that
it would be a lot more elegant, if one could somehow coerce the compiler
to be a bit more relaxed regarding to bitfield access. Actually
'relaxed' is not a good word, because I would not want the compiler to
have a free reign in the access: if there's a semicolon at the end of
the assignment operator expression, then do it bit by bit, adhering
the standard to its strictest. However, the comma operator, and only
that operator, and only if both sides of the comma refer to bit fields
within the same word, and only if explicitely asked by a command line
switch, would tell the compiler to combine the masking and setting
operations within a single fetch - store pair.

Is it a completely brain-dead idea?

Zoltan


ARM conditional instruction optimisation bug (feature?)

2009-07-30 Thread Zoltán Kócsi
On the ARM every instruction can be executed conditionally. GCC very
cleverly uses this feature:

int bar ( int x, int a, int b )
{
   if ( x )

  return a;
else
  return b;
}

compiles to:

bar:
cmp r0, #0  // test x
movne   r0, r1  // retval = 'a' if !0 ('ne')
moveq   r0, r2  // retval = 'b' if 0 ('eq')
bx  lr

However, the following function:

extern unsigned array[ 128 ];

int foo( int x )
{
   int y;

   y = array[ x & 127 ];

   if ( x & 128 )

  y = 123456789 & ( y >> 2 );
   else
  y = 123456789 & y;

   return y;
}

compiled with gcc 4.4.0, using -Os generates this:

foo:

ldr r3, .L8
tst r0, #128
and r0, r0, #127
ldr r3, [r3, r0, asl #2]
ldrne   r0, .L8+4***
ldreq   r0, .L8+4***
movne   r3, r3, asr #2
andne   r0, r3, r0   ***
andeq   r0, r3, r0   ***
bx  lr
.L8:
.word   array
.word   123456789

The lines marked with the *** -s do the same, one executing if the
condition is one way, the other if the condition is the opposite.
That is, together they perform one unconditional instruction, except
that they use two instuctions (and clocks) instead of one.

Compiling with -O2 makes things even worse, because an other issue hits:
gcc sometimes changes a "load constant" to a "generate the constant on
the fly" even when the latter is both slower and larger, other times it
chooses to load a constant even when it can easily (and more cheaply)
generate it from already available values. In this particular case it
decides to build the constant from pieces and combines that with
the generate an unconditional instruction using two complementary
conditional instructions method, resulting in this:

foo:
ldr r3, .L8
tst r0, #128
and r0, r0, #127
ldr r0, [r3, r0, asl #2]
movne   r0, r0, asr #2
bicne   r0, r0, #-134217728
biceq   r0, r0, #-134217728
bicne   r0, r0, #10747904
biceq   r0, r0, #10747904
bicne   r0, r0, #12992
biceq   r0, r0, #12992
bicne   r0, r0, #42
biceq   r0, r0, #42
bx  lr
.L8:
.word   array

Should I report a bug?

Thanks,

Zoltan


Re: array semantic query

2009-07-18 Thread Zoltán Kócsi
> Here it seems GCC is retaining the left hand side type of arr to be
> array of 10 ints whereas on the right hand side
> it has changed its type from array to pointer to integer. I tried

And rightly so.

> searching the relevant sections in the standard ISO C
> document number WG14/N1124 justifying the above behaviour of GCC but
> failed to conclude it from the specifications.

The C99 spec (I only have the draft one, but I think it's pretty
much the same as the final) says, in 6.2.2.3:

 Except when it is the operand of the sizeof operator or the unary &
 operator, or is a character string literal used to initialize an array
 of character type, or is a wide string literal used to initialize an
 array with element type compatible with wchar_t, an lvalue that has
 type ‘‘array of type ’’ is converted to an expression that has type
 ‘‘pointer to type ’’ that points to the initial element of the array
 object and is not an lvalue. If the array object has register storage
 class, the behavior is undefined.

That was spelled out (with different words) in the old K&R C and hasn't
changed since. You can't assign arrays. Since ANSI C you can assign,
pass and return structures and unions, but the array semantics did not
change.

Regards,

Zoltan


Re: AVR C++ - how to move vtables into FLASH memory

2009-06-16 Thread Zoltán Kócsi
> This question would be more appropriate for the mailing list
> gcc-h...@gcc.gnu.org than for g...@gcc.gnu.org.  Please take any
> followups to gcc-help.  Thanks.
> 
> Virtual tables will normally be placed in the .rodata section which
> holds read-only data.  All you should need to do it arrange for the
> .rodata section to be placed in FLASH rather than SRAM.  This would
> normally be done in your linker script.

No, it won't work. The AVR is a Harvard architecture and the FLASH
is not mapped into the data address space. You need to use special
instructions to fetch FLASH data into registers. You load the FLASH
address in a register pair, possibly set a peripheral register to
select which 64K block of the FLASH you want to address, since the
register pair is only 16 bit wide and there can be up to 128K FLASH,
then do a byte or word load into a register or register pair.

The compiler *must* be aware of where the data is, because it needs to
generate completely different code to access data in the FLASH. 

I haven't used the AVR for a while, but as far as I know, gcc does
not provide any support for constants to be stored in FLASH, let it be
const data, gcc generated tables or anything. It was a pain in the
neck, because there's precious little amount of RAM on these processors,
and all your strings and gcc generated internal data were wasting RAM
(without you having any control over that) and even placing your own
data into FLASH required a fairly convoluted use of the section
attribute and inline asm routines to access it.

So it is indeed a valid compiler issue, not an incompetent user issue.
Probably an improvement request would be the best.

Zoltan


Re: ARM : code less efficient with gcc-trunk ?

2009-02-18 Thread Zoltán Kócsi
On Mon, 16 Feb 2009 10:17:36 -0500
Daniel Jacobowitz  wrote:

> On Mon, Feb 16, 2009 at 12:19:52PM +0100, Vincent R. wrote:
> > 00011000 :
> > [...]
> 
> Notice how many more registers used to be pushed?  I expect the new
> code is faster.

Assuming an ARM7 core with 0 wait-state memory and removing all the
identical call bits from the functions, the clocks are on the right
hand side:

   11000:   e92d40f0push{r4, r5, r6, r7, lr}  7
   11004:   e1a04000mov r4, r01
   11008:   e1a05001mov r5, r11
   1100c:   e1a06002mov r6, r21
   11010:   e1a07003mov r7, r31
   11024:   e1a01005mov r1, r51
   11028:   e1a4mov r0, r41
   1102c:   e1a02006mov r2, r61
   11030:   e1a03007mov r3, r71
   11038:   e1a04000mov r4, r01
   11040:   e1a01004mov r1, r41
   11044:   e3a00042mov r0, #66   1

Total: 12 insns, 18 clocks

   11000:   e92d4010push{r4, lr}  4
   11004:   e1a04000mov r4, r01
   11008:   e24dd00csub sp, sp, #12   1
   1100c:   e58d1008str r1, [sp, #8]  2
   11010:   e58d2004str r2, [sp, #4]  2
   11014:   e58d3000str r3, [sp]  2
   11028:   e59d1008ldr r1, [sp, #8]  3
   1102c:   e1a4mov r0, r41
   11030:   e59d2004ldr r2, [sp, #4]  3
   11034:   e59d3000ldr r3, [sp]  3
   1103c:   e1a04000mov r4, r01
   11044:   e1a01004mov r1, r41
   11048:   e3a00042mov r0, #66   1

Total: 13 insns, 25 clocks.

So the version generated by the 4.4.x compiler version is almost 40%
slower (25-18)/18 = 0.3889) than the 4.1.x version and it is also
longer. Pushing many registers is cheap because you it takes 2+n clocks
to move n registers to memory, and then it is n extra clocks to copy
your n registers to the call-saved ones that you pushed. Total cost
2+2n. Storing them individually costs you 1 clock to make space on the
stack, 3n clocks to store them on the stack, i.e. 1+3n. In addition,
when you get them to become parameters to the function calls, a reg-reg
move costs you 1 clock while a load from memory is 3. The example
function does not actually return, but if it did, the old compiler
would lose some of its advantage. The old compiler would finish the
function with

  pop {r4,r5,r6,r7,pc} (9 clocks, final: 13 insns 27 clocks)

and the new compiler's version would be

  add sp,sp,#12 (1 clock)
  pop {r4,pc}   (6 clocks, final: 15 insns 32 clocks)

Even then the old compiler would still beat the new one both in size
and speed.

Zoltan


Re: ARM compiler generating never-used constant data structures

2009-02-05 Thread Zoltán Kócsi
On Thu, 5 Feb 2009 10:58:40 -0200
Alexandre Pereira Nunes  wrote:

> On Wed, Feb 4, 2009 at 11:05 PM, Zoltán Kócsi 
> wrote: [cut]
> >
> > If I compile the above with -O2 or -Os, then if the target is AVR or
> > x86_64 then the result is what I expected, func() just loads 3 or
> > 12345 then returns and that's all. There is no .rodata generated.
> >
> > However, compiling for the ARM generates the same function code,
> > but it also generates the image of "things" in the .rodata segment.
> > Twice. Even when it stores 12345 separatelly. The code never
> > actually references any of them and they are not global thus it is
> > just wasted memory:
> >
> 
> I think it's relevant to ask this: Are you comparing against the same
> gcc release on all the three architectures you mention?

Almost the same:

x86_64: 4.0.2
AVR:4.0.1
ARM:4.0.2

So, at least the Intel and the ARM are the same yet the Intel version
omits the .rodata, the ARM keeps it. I'll check it with the newer
version next week. However, I tend to use the 4.0.x because at least for
the ARM it generates smaller code from the same source than the newer
versions when optimising with -Os.

Zoltan


ARM compiler generating never-used constant data structures

2009-02-04 Thread Zoltán Kócsi
I have various constants. If I define them in a header file like this:

static const int my_thing_a = 3;
static const int my_thing_b = 12345;

then everything is nice, if I use them the compiler knows their value
and uses them as literals and it doesn't actually put them into the
.rodata section (which is important because I have a lot of them and
code space is at premium).

Now these things are very closely related, so it would make the program
much clearer is they could be collected in a structure. That is:

struct things { int a; int b; }; 

and then I could define a global structure

const struct things my_things = { 3, 12345 };

so that I can refer them as my_things.a or my_things.b;

The problem is that I do not want to instantiate the actual "things"
structure, for the same reason I did not want to instantiate the
individual const int definitions. So, I tried the GCC extension of
"compound literals" like this:

#define my_things ((struct things) { 3, 12345 })

int func( int x )
{
   if ( x )
  return my_things.a;
   else
  return my_things.b;
}

If I compile the above with -O2 or -Os, then if the target is AVR or
x86_64 then the result is what I expected, func() just loads 3 or 12345
then returns and that's all. There is no .rodata generated.

However, compiling for the ARM generates the same function code, but it
also generates the image of "things" in the .rodata segment. Twice. Even
when it stores 12345 separatelly. The code never actually references
any of them and they are not global thus it is just wasted memory:

.section.rodata
.align  2
.type   C.1.1095, %object
.size   C.1.1095, 8
C.1.1095:
.word   3
.word   12345
.align  2
.type   C.0.1094, %object
.size   C.0.1094, 8
C.0.1094:
.word   3
.word   12345
.text
.align  2
.global func2
.type   func2, %function
func2:
ldr r3, .L6
cmp r0, #0
moveq   r0, r3
movne   r0, #3
bx  lr
.L7:
.align  2
.L6:
.word   12345
 
Is there a reason why GCC generates the unused .rodata for the ARM
while for the other two it does not?

I guess I'm doing something fundamentally wrong, as usual...

Thanks,

Zoltan


Re: Serious code generation/optimisation bug (I think)

2009-01-30 Thread Zoltán Kócsi
> This sounds like a genuine bug in gcc, then. As far as I can see,
> Andrew is right -- if the ARM hardware requires a legitimate object
> to be placed at address zero, then a standard C compiler has to use
> some other value for the null pointer.

I think changing that would cause more trouble than gain. The
processors where 0 is a legitimate object for pointer dereference 
are mostly the embedded cores without MMU (e.g. ARM7TDMI based
controllers, m68k family controllers, AVR, 68HC1x and alike). Some of
these do actually utilise their entire address space, such as the
68HC11 or the AVR, so there is no address whatsoever that is not a valid
one. Therefore, you can not define a standard-compliant NULL pointer,
unless you define the pointer to something wider than 16 bits and make
the long the smallest int that can store a pointer. In that case, it
would be easier just simply drop those processor families as targets,
as the generated code would be unusable in practice.

In fact, on a naked CPU core in an unknown hardware configuration,
without an MMU you can not define a NULL pointer that is guaranteed
to never point to a valid datum or function, simply becase as far
as the processor is concerned, every address in it entire address
space is valid. Since the compiler can not possibly know what address
is used by the surrounding hardware and what isn't, it can not
guarantee what the standard demands.

That, I think, is a problem with the standard and not with the
compiler. On such targets 0 for NULL is just as good a choice as any
other. Actually better, especially because that is the choice that makes
the conversion between a pointer and an integer a no-op, saving both
code space and execution time.

On such target environment the user has to learn (as I had to) to ask
the compiler not to strictly conform to the standard and not to infer
from a dereference operation that the pointer is not NULL.

Zoltan


Re: Serious code generation/optimisation bug (I think)

2009-01-29 Thread Zoltán Kócsi
On Thu, 29 Jan 2009 08:53:10 +
Andrew Haley  wrote:

> Erik Trulsson wrote:
> > On Wed, Jan 28, 2009 at 04:39:39PM +, Andrew Haley wrote:
> 
> >> "6.3.2.3 Pointers
> >>
> >> If a null pointer constant is converted to a pointer type, the
> >> resulting pointer, called a null pointer, is guaranteed to compare
> >> unequal to a pointer to any object or function."
> >>
> >> This implies that a linker cannot place an object at address zero.
> > 
> > Wrong.  There is nothing which requires a null pointer to be
> > all-bits-zero (even though that is by far the most common
> > representation of null pointers.)
> 
> We're talking about gcc on ARM.  gcc on ARM uses 0 for the null
> pointer constant, therefore a linker cannot place an object at
> address zero. All the rest is irrelevant.
> 
> Andrew.

Um, the linker *must* place the vector table at address zero, because
the ARM, at least the ARM7TDMI fetches all exception vectors from
there. Dictated by the HW, not the compiler.

Zoltan


Re: Serious code generation/optimisation bug (I think)

2009-01-28 Thread Zoltán Kócsi
> No, this is since C90; nothing has changed in this area.  NULL
> doesn't mean "address 0", it means "nothing".  The C statement
> 
>   if (ptr)
> 
> doesn't mean "if ptr does not point to address zero", it means "if ptr
> points to something".

A question then:

How can I make a pointer to point to the integer located at address
0x0? It is a perfectly valid object, it exists, therefore I should be
able to get its address? In fact, if I have a CPU that starts its data
RAM at 0, then the first data object *will* reside at address 0 and
thus taking its address will result in a pointer that has all its bits
clear. Obviously that pointer then should not be equal to NULL, since
it was obtained by taking the address of a valid object, that is, the
pointer indeed points to something. Therefore,

int *a = &first_object;
int *b = (int *) 0;

must result in different values in a and b. Will it?

> I think you perhaps need to be a little less patronizing.

I did not want to be patronising. I wanted to be sarcastic, yes, but
not patronising at all.

> Many of us,
> myself included, have done a great deal of embedded programming and we
> know what the issues are.  You have written an incorrect program, and
> you now know what was incorrect about it.

Yes, I know. However, my problem is not that the program was not
correct. It wasn't and I have admitted it from the beginning. My problem
was that the compiler removed a test on an assumption. It could not
*prove* that the pointer was not NULL. It merely *assumed* it. It can
argue that what I did was wrong, but then it should have told me so. It
did not say anything. It simply decided that since it saw me do
something with a datum, the datum can not possibly be a certain value,
because I should not do that with a datum if it is that certain value.
It was wrong.

If I write

int x[ 10 ];
void foo( int i )
{
   bar( x[ i ] );
   if ( i >= 10 || i < 0 ) { ...

According to the C semantics you shouldn't under or overindex an
array. Thus, you could safely remove the if(). Since I indexed a
10-element array with it, 'i' could not possibly be less than 0 
or more than 9. 

Actually, the pointer (x+13) does not point to any valid object.
Thus, (x+13) == NULL should evaluate true, shouldn't it?

The same elimination should be true to this:

  a = b / c; 
  if ( ! c )

for you can't divide by 0, thus c can not possibly be 0. Does gcc
silently remove the if in the above case?

> > So, pretty please, when the compiler detects that a language
> > resembling to, but not really C is used and removes assumedly
> > (albeit unprovenly) superfluos chunks of code to purify the
> > misguided programmer's intent, could it please at least send a
> > warning?
> 
> In practice that's a lot harder than you might think.  If we were to
> issue a warning for every transformation we made based on the
> semantics of the C language I'm sure people would complain.  "You
> can't dereference a NULL pointer" is a fundamental part of the
> language.

I think I see where your semantics and mine are different:

You say: "You can't dereference a NULL pointer"

I say: "You shouldn't dereference a NULL pointer"

I shouldn't but I most certainly can. I can generate a NULL pointer
where the compiler can not prove (at compile time) that it is NULL. 
If I dereference it, then whatever happens is my problem. I should not
do that but I can and if I do, I take the consequences. The compiler,
in my opinion, must not assume that just because something should not
be done it can not possibly be done. Actually, it should not assume
things at all. Rather, it should prove things before making a
transformation. If it makes a transformation based on nothing more 
than its assumptions, then at least it should give me a warning.

When you eliminate this condition:

unsigned int x;

  if ( x >= 0 ) {

then you are not assuming anything. You know, by definition, that the
condition is true, it's a proven fact. Yet the compiler issues a
warning. 

Or when facing this snipet:

int x, y;

   for ( x = 0 ; x < 10 ; x++ )
 if ( ! x )
y = 3;
 else
y = y + x;

the compiler complains that 'y' might be used uninitialised (well, gcc
might be able to work out *that* one but a slightly more complex would
be beyond its reach). Since it could not *prove* that y was on the LHS
before being used on the RHS it issues a warning.

However, when you eliminate this:

z = *p;

if ( ! p ) {

you *assume* that p was not NULL, because according to the standard
it should not have been. You have absolutely no way to prove it that
it really wasn't. Yet you eliminate the if() without warning.

See my problem?

Zoltan


Re: Serious code generation/optimisation bug (I think)

2009-01-28 Thread Zoltán Kócsi
On Tue, 27 Jan 2009 07:08:51 -0500
Robert Dewar  wrote:

> James Dennett wrote:
> 
> > I don't know how much work it would be to disable this optimization
> > in gcc.
> 
> To me, it is always troublesome to talk of "disable this optimization"
> in a context like this. The program in question is not C, and its
> semantics cannot be determined from the C standard. If the appropriate
> gcc switch to "disable" the optimization is set, then the status of
> the program does not change, it is still undefined nonsense. Yes, it
> may happen to "work" for some definition of "work" that is in your
> mind, but is not defined or guaranteed by any coherent language
> definition. It seems worrisome to be operating under such
> circumstances, so surely the best advice in a situation like
> this is to fix the program.

I don't mean to complain, but I happen to work with embedded systems. 
I program them in C, or at least in a language that uses the syntactic
elements of C. While it might not be a C program and is utter nonsense
from a linguistic view, in the embedded world dereferencing a NULL
pointer is often legal and actually unavoidable. Many embedded systems
run on CPUs without MMU and I'd dare to state that there are many more
of these lesser beasts out there than big multicore, superscalar CPUs
with paged MMUs and vector processing units and FPUs. Now on many of
these at location 0 you find a vector table or memory mapped registers
or simple memory or even the dreaded zero-page where you can do things
that you can't do anywhere else. On every one of those chips it is
legal to dereference a NULL pointer as long as you have the notion of
a pointer being an address of something. I've been programming in C for
almost 30 years and I neglectfully not followed the language's semantic
development, maybe that's why I am confused to think that C is a
low-level, system programming language and not a highly abstract
language where a "pointer" is actually some sort of a complex
reference to an object that may or may not actually occupy memory.
Assuming, of course, that the notion of "memory" is still a valid one,
in the old sense of collection of addressable data units residing in a
so-called address space. I think the existence of keywords referring to
aliasing is an indication to that, but I am not sure any more.

In that caveman mental domain of mine I would assume that if I
dereference a NULL pointer and on the given environment it is a no-no,
then something nasty is going to happen; an exception is raised on a
micro or I get a sig 11 message on my terminal or the whole gizmo just
resets out of the blue. On the other hand, if the given architecture
treats address 0 as address 0, then it should just fetch whatever value
is at 0 and merrily chug along. In fact, I would assume that since on
every CPU I've ever used the address space included 0, one could do
this:

// 32-bit ints
struct s_addr_space {
  int preg[ 0x100 ];// 256 memory mapped peripherial register @ 0
  int gap1[ 0x300 ];// 3K Unused space
  int ether[ 0x400 ];   // 4K Ethernet buffer
  char video[ 0x4000 ]; // 16K video buffer
  int gap2[ 0x800 ];// 8K unused
  int ram[ 0x1000 ];// 16K RAM
  int rom[0x1000 ]; // 16K ROM
...
} * const my_micro = (struct s_addr_space *) 0;

...
   if ( my_micro->video[ 3 ] == 'A' ) { ...

Then I would not assume that the compiler simply throws away each and
every statement that refers to any element of the address space just
because it cleary knows that 'my_micro' is a NULL pointer, therefore 
it can, in its superior wisdom, declare that the code dereferencing
it should not and thus will not be executed whatsoever.

It is possible that the above is complete nonsense and should be
punished by public execution of the programmer, but there are *tons* 
of stuff like that out there on embedded systems, working quite nicely.

I openly admit that the test case was sloppy (and admitted it in my OP).
I do accept that due to the elevation of the C language from the low
level system programming language it used to be to the linguisticly
pure high-level object oriented metalanguage that C-99 apparently is,
dereferencing a NULL pointer became meaningless nonsense these days.

However, I'd like to point out one thing: the compiler is there to help
the programmer. Derefencing a NULL pointer might be a bad thing on one
system and perfectly normal on others. The standard declares that the
behaviour is undefined, i.e. it is up to the compiler writer. Now on a
system where NULL dereference is allowed, silently(!) removing an
explicite test for NULL pointer (indicating that the programmer *knew*
that the pointer can indeed be NULL) is the worst possible solution. It
does not save the program from crashing if the pointer was NULL and the
system does not tolerate it. On the other hand, on a NULL-tolerant
system it makes code that, if the compiler hadn't overruled the
programmer, would have worked just fine but due to the compiler's
d

Re: ARM interworking question

2009-01-22 Thread Zoltán Kócsi
On Wed, 21 Jan 2009 09:49:00 +
Richard Earnshaw  wrote:

> > [...]
> No, this shouldn't be happening unless a) you are doing something
> wrong; b) the options you are supplying to the compiler/linker are
> suggesting some sort of stubs are necessary.  

It was case a), an option left in the makefile that should not have
been there. It was also a case a write-before-read operation on my part,
the README-interwork in gcc/config/arm was most enlightening.

My sincere apologies for generating noise.

Zoltan


ARM interworking question

2009-01-21 Thread Zoltán Kócsi
I have a question with regards to ARM interworking. The target is
ARM7TDMI-S, embedded system with no OS. The compiler is arm-elf-gcc,
4.3.1 with binutils maybe 3 months old.

It seems that when interworking is enabled then when a piece of THUMB
code calls an other piece of THUMB code in a separate file, it calls
a linker-generated entry point that switches the CPU to ARM mode, then
a jump is executed to an ARM prologue inserted in front of the
target THUMB function that switches the CPU back into THUMB mode. That
is, instead of a simple call, a call, a jump and two mode switches are
executed.

I also tried the -mcallee-super-interworking flag, which generates a
short ARM to THUMB switching code in front of a THUMB function, but the
final result does not seem to use the .real_entry_ofFUNCTIONNAME entry
point. Rather, it goes through the same switch back and forth routine.

Is there a way so when both the caller and the callee are compiled with
interworking support the end code switching modes only when it is
necessary? For example, placing a THUMB -> ARM prologue in front of all
functions that are in ARM mode and ARM -> THUMB prologue in front of
THUMB functions and the caller simply calling the real function or the
prologue, depending on its own mode and that of the target? It would
save both code space and execution time.

Thanks,

Zoltan




Re: gcc will become the best optimizing x86 compiler

2008-07-24 Thread Zoltán Kócsi
> [...]
> I have made a few optimized functions myself and published them as a 
> multi-platform library (www.agner.org/optimize/asmlib.zip). It is
> faster than most other libraries on an Intel Core2 and up to ten
> times faster than gcc using builtin functions. My library is
> published with GPL license, but I will allow you to use my code in
> gnu libc if you wish (Sorry, I don't have the time to work on the gnu
> project myself, but you may contact me for details about the code).
> [...]

But then it's not gcc that is the best optimising compiler, but it's 
the best library *hand optimised so that gcc compiles it very well*.

Here's an example:

void foo( void )
{
unsigned x;

for ( x = 0 ; x < 200 ; x++ ) func();
}

void bar( void )
{
unsigned x;

for ( x = 201 ; --x ; ) func();
}

foo() and bar() are completely equivalent, they call func() 200
times and that's all. Yet, if you compile them with -O3 for arm-elf
target with version 4.0.2 (yes, I know, it's an ancient version, but
still) bar() will be 6 insns long with the loop itself being 3 while
foo() compiles to 7 insns of which 4 is the loop. In fact, the compiler
is clever enough to transform bar()'s loop from

for ( x = 201 ; --x ; ) func();
to
x = 200; do func() while ( --x );

internally, the latter form being shorter to evaluate and since x is
not used other than as the loop counter it doesn't matter. However, it
is not clever enough to figure out that foo()'s loop is doing exactly
what bar()'s is doing. Since x is only the loop counter, gcc could
transform foo()'s loop to bar()'s freely but it doesn't. It generates
the equivalent of this:

x = 0; do { x += 1; func(); } while ( x != 240 );

that is not as efficient as what it generates from bar()'s code.

Of course you get surprised when you change -O3 to -Os, in which case
gcc suddenly realises that foo() can indeed be transformed to the
internal representation that it used for bar() with -O3. Thus, we have
foo() now being only 6 insns long with a 3 insn loop. Unfortunately,
bar() is not that lucky. Although it's loop remains 3 insns long, the
entire function is increased by an additional instruction, for bar()
internally now looks like this:

   x = 201;
   goto label;
   do {
  func();
label:
   } while ( --x );

You can play with gcc and see which one of the equivalent C
constructs it compiles to better code with any particular -O level
(and if you have to work  with severely constrained embedded systems
you often do) but then hand-crafting your C code to fit gcc's taste is
actually not that good an idea. With the next release, when different
constructs will be recognised, you may end up with larger and/or slower
code (as it happened to me when changing 4.0.x -> 4.3.x and before when
going from 2.9.x to 3.1.x).

Gcc will be the best optimising compiler when it will generate
faster/shorter code that the other compilers on the majority of
a large set of arbitrary, *not* hand-optimised sources. Preferrably 
for most targets, not only for the x86, if possible :-)

Zoltan