Const flagged as incompatible argument
Consider the following code segment: voidfoo( const char * const m[] ); char*bar( void ); voidbaz( void ) { char*m[ 2 ]; m[ 0 ] = bar(); m[ 1 ] = bar(); foo( m ); } gcc 8.2.0 (and 7.4.1 as well) with -Wall gives a warning, for Intel or ARM target: test.c:12:7: warning: passing argument 1 of ‘foo’ from incompatible pointer type [-Wincompatible-pointer-types] foo( m ); ^ test.c:1:6: note: expected ‘const char * const*’ but argument is of type ‘char **’ My understanding of the C standard (and I might be mistaken) is that with the const-s I promised the compiler that foo() won't modify either the array or the pointed strings, nothing more. So why is the compiler complaining just because I passed a mutable array of mutable strings? Also, how is it different from this case: void foo( const char *p ); char *bar( void ); void baz( void ) { foo( bar() ); } which is accepted by the compiler without a warning. The warning also goes away is m[] is defined as const char *[], but why is the warning issued in the first place? Thanks, Zoltan
Re: An asm constraint issue (ARM FPU)
Dear Marc, Sorry for the late answer, I was away for a few days. Yes, that fixes it. THANK YOU! Do you know which gcc source file contains the magic qualifiers for the asm arguments? I wouldn't mind to go through the code and extract what I can. Probably I'd find a couple of gems that are useful for inline asm stuff. Maybe even write the info pages that describe them, so that others can make use of them... Thanks again, Best Regards, Zoltan On Sun, 25 Jul 2021 14:19:56 +0200 (CEST) Marc Glisse wrote: > On Sun, 25 Jul 2021, Zoltán Kócsi wrote: > > > [...] > > double spoof( uint64_t x ) > > { > > double r; > > > > asm volatile > > ( > > " vmov.64 %[d],%Q[i],%R[i] \n" > > Isn't it supposed to be %P[d] for a double? > (the documentation is very lacking...) > > > [...] > -- > Marc Glisse
An asm constraint issue (ARM FPU)
I try to write a one-liner inline function to create a double form a 64-bit integer, not converting it to a double but the integer containing the bit pattern for the double (type spoofing). The compiler is arm-eabi-gcc 8.2.0. The target is a Cortex-A9, with NEON. According to the info page the assembler constraint "w" denotes an FPU double register, d0 - d31. The code is the following: double spoof( uint64_t x ) { double r; asm volatile ( " vmov.64 %[d],%Q[i],%R[i] \n" : [d] "=w" (r) : [i] "q" (x) ); return r; } The command line: arm-eabi-gcc -O0 -c -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=neon-vfpv4 \ test.c It compiles and the generated object code is this: : 0: e52db004push{fp}; (str fp, [sp, #-4]!) 4: e28db000add fp, sp, #0 8: e24dd014sub sp, sp, #20 c: e14b01f4strdr0, [fp, #-20] ; 0xffec 10: e14b21d4ldrdr2, [fp, #-20] ; 0xffec 14: ec432b30vmovd16, r2, r3 18: ed4b0b03vstrd16, [fp, #-12] 1c: e14b20dcldrdr2, [fp, #-12] 20: ec432b30vmovd16, r2, r3 24: eeb00b60vmov.f64d0, d16 28: e28bd000add sp, fp, #0 2c: e49db004pop {fp}; (ldr fp, [sp], #4) 30: e12fff1ebx lr which is not really efficient, but works. However, if I specify -O1, -O2 or -Os then the compilation fails because assembler complains. This is the assembly the compiler generated, (comments and irrelevant stuff removed): spoof: vmov.64 s0,r0,r1 bx lr where the problem is that 's0' is a single-precision float register and it should be 'd0' instead. Either I'm seriously missing something, in which case I would be most obliged if someone sent me to the right direction; or it is a compiler or documentation bug. Thanks, Zoltan
libgcc maintainer
Who'd be the best person to contact regarding to libgcc for ARM 4T, 6M and 7M targets? Thanks, Zoltan
Re: Assignment to volatile objects
On Mon, 30 Jan 2012 19:51:47 -0600 Gabriel Dos Reis wrote: > On Mon, Jan 30, 2012 at 4:59 PM, Zoltán Kócsi wrote: > > David Brown wrote: > > > >> Until gcc gets a feature allowing it to whack the programmer on the back > >> of the head with Knuth's "The Art of Computer Programming" for writing > >> such stupid code that relies on the behaviour of volatile "a = b = 0;", > >> then a warning seems like a good idea. > > > > a = b = 0; might be stupid. > > > > Is if ( ( a = expr ) ); is also stupid? > > If you ask me, yes. Beauty is in the eye of the beholder, I like while (( *dst++ = *src++ )); better than _some_type_ tmp; do { tmp = *src; *dst = tmp; src = src + 1; dst = dst + 1; } while ( tmp != 0 ); Zoltan
Re: Assignment to volatile objects
On Tue, 31 Jan 2012 00:38:15 +0100 Georg-Johann Lay wrote: > A warning would be much of a help to write unambiguous, robust code. > So the question is rather why user refuses to write robust code in the > first place once there is a warning. The user (me, in this case) does not refuse writing robust code, because he has no other choice, warning or not. The user is kindly asking to accomodate his wish to write concise and more elegant code which is not five times longer just to get around a language ambiguity (i.e. robust). > IMHO it's about weigh cluttering up compiler souces with zillions of > command line options like > > - how to resolve a = b = c; if b is volatile. > - how to resolve i = i++; > - how to resolve f(i++, i++); > - etc. > > against benefit for the user. I don't really see benefit. I think there is a rather important difference between the volatile case and the others. Accessing a volatile is a side effect and so is the postincrementing of the object. In the increment case you do know that the side effects will happen before the next sequence point, you just do not know exactly when within the two enclosing sequence points. With the a=b=0 case the side effect of reading b may or may not happen at all. That, I think, is a major difference. In fact, I think there is an even bigger ambiguity with the volatile. Consider the case of the single statement of a = 0; where a is volatile. a=0; is an expression statement. Such a statement is evaluated as a void expression for its side effects, as per 6.8.3.2. A void expression is an expression of which the value is discarded, as per 6.3.2.2. Thus, the value of a=0 should be calculated and then discarded. Since evaluating the value of that expression when 'a' is volatile may or may not read 'a' back, as per 6.5.16.3, the compiler thus has every right to randomly generate or not a read after writing the 0 to a. That is, a simple assignment has an unpredictable side effect on volatile objects. Nothing in the standard says that you must not actually calculate the value of an expression statement before discarding it, actually it explicitely states in 5.1.2.3.4 that you must not omit parts of an expression evaluation which have side effects even if the expression's value is not used. The read-back of a volatile lhs of an expression is a side effect, which, according to the standard, the compiler can emit or omit at whim. And with that writing 'robust' code becomes impossible, as long as it matters to you whether the a=0; statement will read back the volatile 'a' or not. Zoltan
Re: Assignment to volatile objects
paul_kon...@dell.com> wrote: > I would prefer this to generate a warning. The C language standard change > you refer to is a classic example of a misguided change, and any code whose > behavior depends on this deserves a warning message, NOT an option to work > one way or the other. Sure. However, a compiler is a tool and the best thing it can do is to serve its user's needs. Generate a warning, because there's an ambiguous construct (actually, it has always been a bit iffy, but now it is officially an implementation choice). When there is a possibility of helping the user, what's wrong with offering it? If there's a switch and it is being used, then the user explicitely tells you how (s)he wants the ambiguity to be resolved. The user, by specifying his or her preference clearly indicates that (s)he is aware of the ambiguity, i.e. knows what (s)he is doing and asked you kindly to resolve it this way or the other. You can answer with a "piss off, idiot" or just do what the user asked you to do. So why not help the user? Zoltan
Re: Assignment to volatile objects
David Brown wrote: > Until gcc gets a feature allowing it to whack the programmer on the back > of the head with Knuth's "The Art of Computer Programming" for writing > such stupid code that relies on the behaviour of volatile "a = b = 0;", > then a warning seems like a good idea. a = b = 0; might be stupid. Is if ( ( a = expr ) ); is also stupid? I thought that that idiom was cited as an example for the expressiveness of C in the C bible (the K&R book). Zoltan
Assignment to volatile objects
Now that the new C standard is out, is there any chance that gcc's behaviour regarding to volatile lhs in an assignment changes? This is what it does today: volatile int a, b; a = b = 0; translates to b = 0; a = b; because the standard (up to and including C99) stated that the value of the assignment operator is the value of the lhs after the assignment. The C11 standard says the same but then it explicitely states that the compiler does not have to read back the value of the lhs, not even when the lhs is volatile. So it is actually legal now not to read back the lhs. Is there a chance for the compiler to get a switch which would tell it explicitely not to read the value back? Zoltan
Re: Float point issue
On Thu, 27 Oct 2011 23:31:14 -0400 Robert Dewar wrote: > > - I am missing a gcc flag > > probably you should avoid extra precision and all the > issues it brings, as well as speed up your program, by > using SSE 64-bit arithmetic (using the appropriate gcc > flags) Indeed. -mpc64 fixes the issue and proper 53-bit rounding is applied. Thanks a lot. Zoltan
Float point issue
I found something very strange, although it might be just a misunderstanding. As far as I know, the IEEE-754 standard defines round-to-nearest, tie-to-even as follows: - For rounding purposes, the operation must be performed as if it were done with infinite precision - Then, if the bit right of the LSB of the result (guard) is 0, do nothing. - Otherwise, if *any* bit right of the guard bit is 1, add 1 to the result - Otherwise, if the LSB of the result is 1, add 1 to the result - Otherwise leave the result as it is Assuming that the above is true, and that that is the default rounding mode, then the following is most surprising. This bit of code: #include #include main() { double a, b; int i; a = 1.0; for ( i = -54 ; i > -106 ; i-- ) { b = ldexp( 1.0, -53 ) + ldexp( 1.0, i ); printf( "%.13a + %.13a = %.13a\n", a, b, a+b ); } } generates this output on an Intel i5 core: 0x1.0p+0 + 0x1.8p-53 = 0x1.1p+0 0x1.0p+0 + 0x1.4p-53 = 0x1.1p+0 0x1.0p+0 + 0x1.2p-53 = 0x1.1p+0 [...] 0x1.0p+0 + 0x1.01000p-53 = 0x1.1p+0 0x1.0p+0 + 0x1.00800p-53 = 0x1.1p+0 0x1.0p+0 + 0x1.00400p-53 = 0x1.1p+0 0x1.0p+0 + 0x1.00200p-53 = 0x1.0p+0 <== ? 0x1.0p+0 + 0x1.00100p-53 = 0x1.0p+0 0x1.0p+0 + 0x1.00080p-53 = 0x1.0p+0 0x1.0p+0 + 0x1.00040p-53 = 0x1.0p+0 [...] 0x1.0p+0 + 0x1.4p-53 = 0x1.0p+0 0x1.0p+0 + 0x1.2p-53 = 0x1.0p+0 0x1.0p+0 + 0x1.1p-53 = 0x1.0p+0 which seems to indicate that the sticky bit contains only the first 10 bits right of the guard, all the rest is thrown away silently (as if internally the operations were done on 64 bits only). I tried to pass -fexcess-precision=standard to gcc, but got the same result. I wonder whether - I know the IEEE rounding rules incorrectly - I am missing a gcc flag - gcc is doing something funny when setting up the FPU - Intel's FPU is not standard compiant Thanks, Zoltan
Built-in function question
I came across some very interesting behavior regarding to built-in functions: int __builtin_popcount( unsigned x ); is a gcc bult-in, which actually returns the number of 1 bits in x. int foo( unsigned x ) { return __builtin_popcount( x ); } generates a call to the __popcountsi2 function in libgcc, for any target I tried it for (well, I tried for x86, ARM and m68k). However: int (*bar( void ))( unsigned ) { return __builtin_popcount; } returns the address of the label "__builtin_popcount", which does not exist: int main( int arcg, char *argv[] ) { (void) argv; return (*bar())( argc ); } fails to compile because of an undefined reference to __builtin_popcount. The compiler does not give any warning with -Wall -Wextra -pedantic but it spits the dummy during the linking phase. The next quite interesting thing is the effect of optimisation. With -O1 or above bar() returns the address of the non-existent function __builtin_popcount() *but* main(), which dereferences bar() is optimised to simply call __popcountsi2 in the library. So the linking fails because bar() (which is not actually called by main()) refers to the nonexistent function, but if bar() is made static, the optimisiation gets rid of it and everything is fine and the linking succeeds. A further point is that the compiler generates a .globl for __popcountsi2 but it does not do that for __builtin_popcount, which is rather unusual (although not fatal, since gas treats all undefined symbols as globals). Nevertheless, gcc normally pedanticly emits a .globl for every global symbol it generates or refers to, but not in this case. At least the 4.5.x compiler behaves like that. The info page does not say that one can not take the address of a built-in function (and the compiler does not issue a warning on it), so a link time failure, which depends on whether the optimiser could eliminate the need to the actual function pointer or not, is somewhat surprising. I understand that there are very special built-in functions, some that work only at compile time, some show very funky argument handling behaviour and so on. However, many are (well, seem to be) stock standard functions, realised either as a call to libgcc or as a few machine instructions, that is, behaving like inline asm() wrapped in a static inline. Those functions, I think, should really behave like ordinary (possibly static inline asm) functions. Or, if not, at least one should be warned. I believe that the above is an issue, but I don't know if it is bug or a feature, i.e. a compiler or a documentation issue? Thanks, Zoltan
Register constrained variable issue
If one writes a bit of code like this: int foo( void ) { register int x asm( "Rn" ); asm volatile ( "INSN_A %0 \n\t" : "=r" (x) ); bar(); asm volatile ( "INSN_B %0,%0 \n\t" : "=r" (x) : "0" (x) ); return x; } and Rn is a register not saved over function calls, then gcc does not save it but allows it to get clobbered by the calling of bar(). For example, if the processor is ARM and Rn is r2 (on ARM r0-r3 and r12 can get clobbered by a function call), then the following code is generated; if you don't know ARM assembly, comments tell you what's going on: foo: stmfd sp!, {r3, lr} // Save the return address INSN_A r2// The insn generating r2's content bl bar // Call bar(); it may destroy r2 INSN_B r2, r2// *** Here a possibly clobbered r2 is used! mov r0, r2// Copy r2 to the return value register ldmfd sp!, {r3, lr} // Restore the return address bx lr// Return Note that you don't need a real function call in your code, it is enough if you do something which forces gcc to call a function in libgcc.a. On some ARM targets a long long shift or an integer division or even just a switch {} statement is enough to trigger a call to the support library. Which basically means that one *must not* allocate a register not saved by function calls because then they can get clobbered at any time. It is not an ARM specific issue, either; other targets behave the same. The compiler version is 4.5.3. The info page regarding to specifying registers for variables does not say that the register one chooses must be a register saved across calls. On the other hand, it does say that register content might be destroyed when the compiler knows that the data is not live any more; a statement which has a vibe suggesting that the register content is preserved as long as the data is live. For global register variables the info page does actually warn about library routines possibly clobbering registers and says that one should use a saved and restored register. However, global and function local variables are very different animals; global regs are reserved and not tracked by the data flow analysis while local var regs are part of the data flow analysis, as stated by the info page. So I don't know if it is a bug (i.e. the compiler is supposed to protect local reg vars) or just misleading/omitted information in the info page? Thanks, Zoltan
libgcc question
Am I doing something wrong or there's a problem with libgcc? I'm compiling code for an ARM based micro. I'm using gcc 4.5.1, configured for arm-eabi-none, C compiler only. The target is a standalone embedded device, no OS, nothing, not even a C library, just bare metal. The compiler (and linker, gcc is being used to start the linker) get the -ffreestanding -static -static-libgcc -nostdlib flags. Everything works fine until I want to do a 64-bit division. Then the linking fails, telling me that I have undefined references to memcpy, abort, __exidx_start and __exidx_end. Telling the linker to create an output despite the missing references reveals that the resulting object file contains the actual 64-bit division from libgcc, as expected. Plus it also contains about 4KB worth of functions related to unwinding, which are never referenced anywhere (i.e. the libgcc division routine does not call or use *any* of the functions there). There is all sorts of code in there to deal with the (nonexistent) float-point coprocessor, throwing exceptions and other magic. So, a function containing a single call to __aeabi_uldivmod results in about 4 KB unused code being sucked in from libgcc.a (some of which could not even be executed by the target processor), with 4 undefined references, of which __exidx_start and __exidx_end are, as far as I know, not even standard library functions. Is this a bug in libgcc, have I massiviley misconfigured the compilation of gcc itself or am I doing something horribly wrong but can't see the obvious? Zoltan
Re: array of pointer to function support in GNU C
On Thu, 16 Sep 2010 00:50:07 -0700 J Decker wrote: [...] > > int main(void) > > { > > void *(*func)(void **); > > func; > strange that this does anything... since it also requires a pointer to > a pointer... I think the compiler is right: "func" is a pointer to a function. Since the () operator (function call) is not used, it simply parses as an expression without any side effect. Same as char *x; x; Zoltan
Inline assembly operand specification
Is there a documentation of the various magic letters that you can apply to an operand in inline assembly? What I mean is this: asm volatile ( " some_insn %X[operand] \n" : [operand] "=r" (expr) ); What I look for is documentation of 'X'. In particular, when (expr) is a multi-register object, such as long long or double (or even a short, on a 8-bit chip) and you want to select a particular part of it. The only place I found some information was going through the gcc/config//.c file and trying to find the meaning of such letters in the xxx_print_operand() function. If that is the correct approach, then I think there's a problem with the arm-elf (I know it is dead, but still). According to the comments in that function, for DI and DF arguments the Q and R qualifiers supposed to select the least significant and most significant 32 bits, respectively, of the 64-bit datum. Indeed that's what they do, for a long long. However, for a double they don't seem to take into account that on arm-elf the word order of a double is always big-endian, regardless of the endianness of the rest. Therefore, they select the wrong half of the datum. On arm-eabi, where the endianness of doubles matches the rest, they work fine. Am I completely off-track? Zoltan
Bitfields
I wonder if there would be at least a theoretical support by the developers to a proposal for volatile bitfields: When a HW register (thus most likely declared as volatile) is defined as a bitfield, as far as I know gcc treats each bitfield assignment as a separate read-modify-write operation. Thats is, if I have a 32-bit register with 3 fields struct s_hw_reg { int field1 : 10, field2 : 10, field3 : 12; }; then reg.field1 = val1; reg.field2 = val2; will be turned into a fetch, mask, or with val1, store, fetch, mask, or with val2, store sequence. I wonder if there could be a special gcc extension, strictly only when a -f option is explicitely passed to the compiler, where the comma operator could be used to tell the compiler to concatenate the operations: reg.field1 = val1, reg.field2 = val2; would then turn into fetch, mask with a combined mask of field1 and field2, or val1, or val2, store. Since the bit field operations can not be concatenated that way currently, and quite frequently you want to change multiple fields in a HW register simultaneously (i.e. with a single write), more often than not you have to give up the whole bit field notion and define everything like #define MASK1 0xffc0 #define MASK2 0x003ff000 #define MASK3 0x0fff and so on, then you explicitely write the code that fetches, masks with a compined mas, or-s with a combined field value set and stores. A lot of typing could be avoided with the bitfields, not to mention that it would be a lot more elegant, if one could somehow coerce the compiler to be a bit more relaxed regarding to bitfield access. Actually 'relaxed' is not a good word, because I would not want the compiler to have a free reign in the access: if there's a semicolon at the end of the assignment operator expression, then do it bit by bit, adhering the standard to its strictest. However, the comma operator, and only that operator, and only if both sides of the comma refer to bit fields within the same word, and only if explicitely asked by a command line switch, would tell the compiler to combine the masking and setting operations within a single fetch - store pair. Is it a completely brain-dead idea? Zoltan
ARM conditional instruction optimisation bug (feature?)
On the ARM every instruction can be executed conditionally. GCC very cleverly uses this feature: int bar ( int x, int a, int b ) { if ( x ) return a; else return b; } compiles to: bar: cmp r0, #0 // test x movne r0, r1 // retval = 'a' if !0 ('ne') moveq r0, r2 // retval = 'b' if 0 ('eq') bx lr However, the following function: extern unsigned array[ 128 ]; int foo( int x ) { int y; y = array[ x & 127 ]; if ( x & 128 ) y = 123456789 & ( y >> 2 ); else y = 123456789 & y; return y; } compiled with gcc 4.4.0, using -Os generates this: foo: ldr r3, .L8 tst r0, #128 and r0, r0, #127 ldr r3, [r3, r0, asl #2] ldrne r0, .L8+4*** ldreq r0, .L8+4*** movne r3, r3, asr #2 andne r0, r3, r0 *** andeq r0, r3, r0 *** bx lr .L8: .word array .word 123456789 The lines marked with the *** -s do the same, one executing if the condition is one way, the other if the condition is the opposite. That is, together they perform one unconditional instruction, except that they use two instuctions (and clocks) instead of one. Compiling with -O2 makes things even worse, because an other issue hits: gcc sometimes changes a "load constant" to a "generate the constant on the fly" even when the latter is both slower and larger, other times it chooses to load a constant even when it can easily (and more cheaply) generate it from already available values. In this particular case it decides to build the constant from pieces and combines that with the generate an unconditional instruction using two complementary conditional instructions method, resulting in this: foo: ldr r3, .L8 tst r0, #128 and r0, r0, #127 ldr r0, [r3, r0, asl #2] movne r0, r0, asr #2 bicne r0, r0, #-134217728 biceq r0, r0, #-134217728 bicne r0, r0, #10747904 biceq r0, r0, #10747904 bicne r0, r0, #12992 biceq r0, r0, #12992 bicne r0, r0, #42 biceq r0, r0, #42 bx lr .L8: .word array Should I report a bug? Thanks, Zoltan
Re: array semantic query
> Here it seems GCC is retaining the left hand side type of arr to be > array of 10 ints whereas on the right hand side > it has changed its type from array to pointer to integer. I tried And rightly so. > searching the relevant sections in the standard ISO C > document number WG14/N1124 justifying the above behaviour of GCC but > failed to conclude it from the specifications. The C99 spec (I only have the draft one, but I think it's pretty much the same as the final) says, in 6.2.2.3: Except when it is the operand of the sizeof operator or the unary & operator, or is a character string literal used to initialize an array of character type, or is a wide string literal used to initialize an array with element type compatible with wchar_t, an lvalue that has type ‘‘array of type ’’ is converted to an expression that has type ‘‘pointer to type ’’ that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined. That was spelled out (with different words) in the old K&R C and hasn't changed since. You can't assign arrays. Since ANSI C you can assign, pass and return structures and unions, but the array semantics did not change. Regards, Zoltan
Re: AVR C++ - how to move vtables into FLASH memory
> This question would be more appropriate for the mailing list > gcc-h...@gcc.gnu.org than for g...@gcc.gnu.org. Please take any > followups to gcc-help. Thanks. > > Virtual tables will normally be placed in the .rodata section which > holds read-only data. All you should need to do it arrange for the > .rodata section to be placed in FLASH rather than SRAM. This would > normally be done in your linker script. No, it won't work. The AVR is a Harvard architecture and the FLASH is not mapped into the data address space. You need to use special instructions to fetch FLASH data into registers. You load the FLASH address in a register pair, possibly set a peripheral register to select which 64K block of the FLASH you want to address, since the register pair is only 16 bit wide and there can be up to 128K FLASH, then do a byte or word load into a register or register pair. The compiler *must* be aware of where the data is, because it needs to generate completely different code to access data in the FLASH. I haven't used the AVR for a while, but as far as I know, gcc does not provide any support for constants to be stored in FLASH, let it be const data, gcc generated tables or anything. It was a pain in the neck, because there's precious little amount of RAM on these processors, and all your strings and gcc generated internal data were wasting RAM (without you having any control over that) and even placing your own data into FLASH required a fairly convoluted use of the section attribute and inline asm routines to access it. So it is indeed a valid compiler issue, not an incompetent user issue. Probably an improvement request would be the best. Zoltan
Re: ARM : code less efficient with gcc-trunk ?
On Mon, 16 Feb 2009 10:17:36 -0500 Daniel Jacobowitz wrote: > On Mon, Feb 16, 2009 at 12:19:52PM +0100, Vincent R. wrote: > > 00011000 : > > [...] > > Notice how many more registers used to be pushed? I expect the new > code is faster. Assuming an ARM7 core with 0 wait-state memory and removing all the identical call bits from the functions, the clocks are on the right hand side: 11000: e92d40f0push{r4, r5, r6, r7, lr} 7 11004: e1a04000mov r4, r01 11008: e1a05001mov r5, r11 1100c: e1a06002mov r6, r21 11010: e1a07003mov r7, r31 11024: e1a01005mov r1, r51 11028: e1a4mov r0, r41 1102c: e1a02006mov r2, r61 11030: e1a03007mov r3, r71 11038: e1a04000mov r4, r01 11040: e1a01004mov r1, r41 11044: e3a00042mov r0, #66 1 Total: 12 insns, 18 clocks 11000: e92d4010push{r4, lr} 4 11004: e1a04000mov r4, r01 11008: e24dd00csub sp, sp, #12 1 1100c: e58d1008str r1, [sp, #8] 2 11010: e58d2004str r2, [sp, #4] 2 11014: e58d3000str r3, [sp] 2 11028: e59d1008ldr r1, [sp, #8] 3 1102c: e1a4mov r0, r41 11030: e59d2004ldr r2, [sp, #4] 3 11034: e59d3000ldr r3, [sp] 3 1103c: e1a04000mov r4, r01 11044: e1a01004mov r1, r41 11048: e3a00042mov r0, #66 1 Total: 13 insns, 25 clocks. So the version generated by the 4.4.x compiler version is almost 40% slower (25-18)/18 = 0.3889) than the 4.1.x version and it is also longer. Pushing many registers is cheap because you it takes 2+n clocks to move n registers to memory, and then it is n extra clocks to copy your n registers to the call-saved ones that you pushed. Total cost 2+2n. Storing them individually costs you 1 clock to make space on the stack, 3n clocks to store them on the stack, i.e. 1+3n. In addition, when you get them to become parameters to the function calls, a reg-reg move costs you 1 clock while a load from memory is 3. The example function does not actually return, but if it did, the old compiler would lose some of its advantage. The old compiler would finish the function with pop {r4,r5,r6,r7,pc} (9 clocks, final: 13 insns 27 clocks) and the new compiler's version would be add sp,sp,#12 (1 clock) pop {r4,pc} (6 clocks, final: 15 insns 32 clocks) Even then the old compiler would still beat the new one both in size and speed. Zoltan
Re: ARM compiler generating never-used constant data structures
On Thu, 5 Feb 2009 10:58:40 -0200 Alexandre Pereira Nunes wrote: > On Wed, Feb 4, 2009 at 11:05 PM, Zoltán Kócsi > wrote: [cut] > > > > If I compile the above with -O2 or -Os, then if the target is AVR or > > x86_64 then the result is what I expected, func() just loads 3 or > > 12345 then returns and that's all. There is no .rodata generated. > > > > However, compiling for the ARM generates the same function code, > > but it also generates the image of "things" in the .rodata segment. > > Twice. Even when it stores 12345 separatelly. The code never > > actually references any of them and they are not global thus it is > > just wasted memory: > > > > I think it's relevant to ask this: Are you comparing against the same > gcc release on all the three architectures you mention? Almost the same: x86_64: 4.0.2 AVR:4.0.1 ARM:4.0.2 So, at least the Intel and the ARM are the same yet the Intel version omits the .rodata, the ARM keeps it. I'll check it with the newer version next week. However, I tend to use the 4.0.x because at least for the ARM it generates smaller code from the same source than the newer versions when optimising with -Os. Zoltan
ARM compiler generating never-used constant data structures
I have various constants. If I define them in a header file like this: static const int my_thing_a = 3; static const int my_thing_b = 12345; then everything is nice, if I use them the compiler knows their value and uses them as literals and it doesn't actually put them into the .rodata section (which is important because I have a lot of them and code space is at premium). Now these things are very closely related, so it would make the program much clearer is they could be collected in a structure. That is: struct things { int a; int b; }; and then I could define a global structure const struct things my_things = { 3, 12345 }; so that I can refer them as my_things.a or my_things.b; The problem is that I do not want to instantiate the actual "things" structure, for the same reason I did not want to instantiate the individual const int definitions. So, I tried the GCC extension of "compound literals" like this: #define my_things ((struct things) { 3, 12345 }) int func( int x ) { if ( x ) return my_things.a; else return my_things.b; } If I compile the above with -O2 or -Os, then if the target is AVR or x86_64 then the result is what I expected, func() just loads 3 or 12345 then returns and that's all. There is no .rodata generated. However, compiling for the ARM generates the same function code, but it also generates the image of "things" in the .rodata segment. Twice. Even when it stores 12345 separatelly. The code never actually references any of them and they are not global thus it is just wasted memory: .section.rodata .align 2 .type C.1.1095, %object .size C.1.1095, 8 C.1.1095: .word 3 .word 12345 .align 2 .type C.0.1094, %object .size C.0.1094, 8 C.0.1094: .word 3 .word 12345 .text .align 2 .global func2 .type func2, %function func2: ldr r3, .L6 cmp r0, #0 moveq r0, r3 movne r0, #3 bx lr .L7: .align 2 .L6: .word 12345 Is there a reason why GCC generates the unused .rodata for the ARM while for the other two it does not? I guess I'm doing something fundamentally wrong, as usual... Thanks, Zoltan
Re: Serious code generation/optimisation bug (I think)
> This sounds like a genuine bug in gcc, then. As far as I can see, > Andrew is right -- if the ARM hardware requires a legitimate object > to be placed at address zero, then a standard C compiler has to use > some other value for the null pointer. I think changing that would cause more trouble than gain. The processors where 0 is a legitimate object for pointer dereference are mostly the embedded cores without MMU (e.g. ARM7TDMI based controllers, m68k family controllers, AVR, 68HC1x and alike). Some of these do actually utilise their entire address space, such as the 68HC11 or the AVR, so there is no address whatsoever that is not a valid one. Therefore, you can not define a standard-compliant NULL pointer, unless you define the pointer to something wider than 16 bits and make the long the smallest int that can store a pointer. In that case, it would be easier just simply drop those processor families as targets, as the generated code would be unusable in practice. In fact, on a naked CPU core in an unknown hardware configuration, without an MMU you can not define a NULL pointer that is guaranteed to never point to a valid datum or function, simply becase as far as the processor is concerned, every address in it entire address space is valid. Since the compiler can not possibly know what address is used by the surrounding hardware and what isn't, it can not guarantee what the standard demands. That, I think, is a problem with the standard and not with the compiler. On such targets 0 for NULL is just as good a choice as any other. Actually better, especially because that is the choice that makes the conversion between a pointer and an integer a no-op, saving both code space and execution time. On such target environment the user has to learn (as I had to) to ask the compiler not to strictly conform to the standard and not to infer from a dereference operation that the pointer is not NULL. Zoltan
Re: Serious code generation/optimisation bug (I think)
On Thu, 29 Jan 2009 08:53:10 + Andrew Haley wrote: > Erik Trulsson wrote: > > On Wed, Jan 28, 2009 at 04:39:39PM +, Andrew Haley wrote: > > >> "6.3.2.3 Pointers > >> > >> If a null pointer constant is converted to a pointer type, the > >> resulting pointer, called a null pointer, is guaranteed to compare > >> unequal to a pointer to any object or function." > >> > >> This implies that a linker cannot place an object at address zero. > > > > Wrong. There is nothing which requires a null pointer to be > > all-bits-zero (even though that is by far the most common > > representation of null pointers.) > > We're talking about gcc on ARM. gcc on ARM uses 0 for the null > pointer constant, therefore a linker cannot place an object at > address zero. All the rest is irrelevant. > > Andrew. Um, the linker *must* place the vector table at address zero, because the ARM, at least the ARM7TDMI fetches all exception vectors from there. Dictated by the HW, not the compiler. Zoltan
Re: Serious code generation/optimisation bug (I think)
> No, this is since C90; nothing has changed in this area. NULL > doesn't mean "address 0", it means "nothing". The C statement > > if (ptr) > > doesn't mean "if ptr does not point to address zero", it means "if ptr > points to something". A question then: How can I make a pointer to point to the integer located at address 0x0? It is a perfectly valid object, it exists, therefore I should be able to get its address? In fact, if I have a CPU that starts its data RAM at 0, then the first data object *will* reside at address 0 and thus taking its address will result in a pointer that has all its bits clear. Obviously that pointer then should not be equal to NULL, since it was obtained by taking the address of a valid object, that is, the pointer indeed points to something. Therefore, int *a = &first_object; int *b = (int *) 0; must result in different values in a and b. Will it? > I think you perhaps need to be a little less patronizing. I did not want to be patronising. I wanted to be sarcastic, yes, but not patronising at all. > Many of us, > myself included, have done a great deal of embedded programming and we > know what the issues are. You have written an incorrect program, and > you now know what was incorrect about it. Yes, I know. However, my problem is not that the program was not correct. It wasn't and I have admitted it from the beginning. My problem was that the compiler removed a test on an assumption. It could not *prove* that the pointer was not NULL. It merely *assumed* it. It can argue that what I did was wrong, but then it should have told me so. It did not say anything. It simply decided that since it saw me do something with a datum, the datum can not possibly be a certain value, because I should not do that with a datum if it is that certain value. It was wrong. If I write int x[ 10 ]; void foo( int i ) { bar( x[ i ] ); if ( i >= 10 || i < 0 ) { ... According to the C semantics you shouldn't under or overindex an array. Thus, you could safely remove the if(). Since I indexed a 10-element array with it, 'i' could not possibly be less than 0 or more than 9. Actually, the pointer (x+13) does not point to any valid object. Thus, (x+13) == NULL should evaluate true, shouldn't it? The same elimination should be true to this: a = b / c; if ( ! c ) for you can't divide by 0, thus c can not possibly be 0. Does gcc silently remove the if in the above case? > > So, pretty please, when the compiler detects that a language > > resembling to, but not really C is used and removes assumedly > > (albeit unprovenly) superfluos chunks of code to purify the > > misguided programmer's intent, could it please at least send a > > warning? > > In practice that's a lot harder than you might think. If we were to > issue a warning for every transformation we made based on the > semantics of the C language I'm sure people would complain. "You > can't dereference a NULL pointer" is a fundamental part of the > language. I think I see where your semantics and mine are different: You say: "You can't dereference a NULL pointer" I say: "You shouldn't dereference a NULL pointer" I shouldn't but I most certainly can. I can generate a NULL pointer where the compiler can not prove (at compile time) that it is NULL. If I dereference it, then whatever happens is my problem. I should not do that but I can and if I do, I take the consequences. The compiler, in my opinion, must not assume that just because something should not be done it can not possibly be done. Actually, it should not assume things at all. Rather, it should prove things before making a transformation. If it makes a transformation based on nothing more than its assumptions, then at least it should give me a warning. When you eliminate this condition: unsigned int x; if ( x >= 0 ) { then you are not assuming anything. You know, by definition, that the condition is true, it's a proven fact. Yet the compiler issues a warning. Or when facing this snipet: int x, y; for ( x = 0 ; x < 10 ; x++ ) if ( ! x ) y = 3; else y = y + x; the compiler complains that 'y' might be used uninitialised (well, gcc might be able to work out *that* one but a slightly more complex would be beyond its reach). Since it could not *prove* that y was on the LHS before being used on the RHS it issues a warning. However, when you eliminate this: z = *p; if ( ! p ) { you *assume* that p was not NULL, because according to the standard it should not have been. You have absolutely no way to prove it that it really wasn't. Yet you eliminate the if() without warning. See my problem? Zoltan
Re: Serious code generation/optimisation bug (I think)
On Tue, 27 Jan 2009 07:08:51 -0500 Robert Dewar wrote: > James Dennett wrote: > > > I don't know how much work it would be to disable this optimization > > in gcc. > > To me, it is always troublesome to talk of "disable this optimization" > in a context like this. The program in question is not C, and its > semantics cannot be determined from the C standard. If the appropriate > gcc switch to "disable" the optimization is set, then the status of > the program does not change, it is still undefined nonsense. Yes, it > may happen to "work" for some definition of "work" that is in your > mind, but is not defined or guaranteed by any coherent language > definition. It seems worrisome to be operating under such > circumstances, so surely the best advice in a situation like > this is to fix the program. I don't mean to complain, but I happen to work with embedded systems. I program them in C, or at least in a language that uses the syntactic elements of C. While it might not be a C program and is utter nonsense from a linguistic view, in the embedded world dereferencing a NULL pointer is often legal and actually unavoidable. Many embedded systems run on CPUs without MMU and I'd dare to state that there are many more of these lesser beasts out there than big multicore, superscalar CPUs with paged MMUs and vector processing units and FPUs. Now on many of these at location 0 you find a vector table or memory mapped registers or simple memory or even the dreaded zero-page where you can do things that you can't do anywhere else. On every one of those chips it is legal to dereference a NULL pointer as long as you have the notion of a pointer being an address of something. I've been programming in C for almost 30 years and I neglectfully not followed the language's semantic development, maybe that's why I am confused to think that C is a low-level, system programming language and not a highly abstract language where a "pointer" is actually some sort of a complex reference to an object that may or may not actually occupy memory. Assuming, of course, that the notion of "memory" is still a valid one, in the old sense of collection of addressable data units residing in a so-called address space. I think the existence of keywords referring to aliasing is an indication to that, but I am not sure any more. In that caveman mental domain of mine I would assume that if I dereference a NULL pointer and on the given environment it is a no-no, then something nasty is going to happen; an exception is raised on a micro or I get a sig 11 message on my terminal or the whole gizmo just resets out of the blue. On the other hand, if the given architecture treats address 0 as address 0, then it should just fetch whatever value is at 0 and merrily chug along. In fact, I would assume that since on every CPU I've ever used the address space included 0, one could do this: // 32-bit ints struct s_addr_space { int preg[ 0x100 ];// 256 memory mapped peripherial register @ 0 int gap1[ 0x300 ];// 3K Unused space int ether[ 0x400 ]; // 4K Ethernet buffer char video[ 0x4000 ]; // 16K video buffer int gap2[ 0x800 ];// 8K unused int ram[ 0x1000 ];// 16K RAM int rom[0x1000 ]; // 16K ROM ... } * const my_micro = (struct s_addr_space *) 0; ... if ( my_micro->video[ 3 ] == 'A' ) { ... Then I would not assume that the compiler simply throws away each and every statement that refers to any element of the address space just because it cleary knows that 'my_micro' is a NULL pointer, therefore it can, in its superior wisdom, declare that the code dereferencing it should not and thus will not be executed whatsoever. It is possible that the above is complete nonsense and should be punished by public execution of the programmer, but there are *tons* of stuff like that out there on embedded systems, working quite nicely. I openly admit that the test case was sloppy (and admitted it in my OP). I do accept that due to the elevation of the C language from the low level system programming language it used to be to the linguisticly pure high-level object oriented metalanguage that C-99 apparently is, dereferencing a NULL pointer became meaningless nonsense these days. However, I'd like to point out one thing: the compiler is there to help the programmer. Derefencing a NULL pointer might be a bad thing on one system and perfectly normal on others. The standard declares that the behaviour is undefined, i.e. it is up to the compiler writer. Now on a system where NULL dereference is allowed, silently(!) removing an explicite test for NULL pointer (indicating that the programmer *knew* that the pointer can indeed be NULL) is the worst possible solution. It does not save the program from crashing if the pointer was NULL and the system does not tolerate it. On the other hand, on a NULL-tolerant system it makes code that, if the compiler hadn't overruled the programmer, would have worked just fine but due to the compiler's d
Re: ARM interworking question
On Wed, 21 Jan 2009 09:49:00 + Richard Earnshaw wrote: > > [...] > No, this shouldn't be happening unless a) you are doing something > wrong; b) the options you are supplying to the compiler/linker are > suggesting some sort of stubs are necessary. It was case a), an option left in the makefile that should not have been there. It was also a case a write-before-read operation on my part, the README-interwork in gcc/config/arm was most enlightening. My sincere apologies for generating noise. Zoltan
ARM interworking question
I have a question with regards to ARM interworking. The target is ARM7TDMI-S, embedded system with no OS. The compiler is arm-elf-gcc, 4.3.1 with binutils maybe 3 months old. It seems that when interworking is enabled then when a piece of THUMB code calls an other piece of THUMB code in a separate file, it calls a linker-generated entry point that switches the CPU to ARM mode, then a jump is executed to an ARM prologue inserted in front of the target THUMB function that switches the CPU back into THUMB mode. That is, instead of a simple call, a call, a jump and two mode switches are executed. I also tried the -mcallee-super-interworking flag, which generates a short ARM to THUMB switching code in front of a THUMB function, but the final result does not seem to use the .real_entry_ofFUNCTIONNAME entry point. Rather, it goes through the same switch back and forth routine. Is there a way so when both the caller and the callee are compiled with interworking support the end code switching modes only when it is necessary? For example, placing a THUMB -> ARM prologue in front of all functions that are in ARM mode and ARM -> THUMB prologue in front of THUMB functions and the caller simply calling the real function or the prologue, depending on its own mode and that of the target? It would save both code space and execution time. Thanks, Zoltan
Re: gcc will become the best optimizing x86 compiler
> [...] > I have made a few optimized functions myself and published them as a > multi-platform library (www.agner.org/optimize/asmlib.zip). It is > faster than most other libraries on an Intel Core2 and up to ten > times faster than gcc using builtin functions. My library is > published with GPL license, but I will allow you to use my code in > gnu libc if you wish (Sorry, I don't have the time to work on the gnu > project myself, but you may contact me for details about the code). > [...] But then it's not gcc that is the best optimising compiler, but it's the best library *hand optimised so that gcc compiles it very well*. Here's an example: void foo( void ) { unsigned x; for ( x = 0 ; x < 200 ; x++ ) func(); } void bar( void ) { unsigned x; for ( x = 201 ; --x ; ) func(); } foo() and bar() are completely equivalent, they call func() 200 times and that's all. Yet, if you compile them with -O3 for arm-elf target with version 4.0.2 (yes, I know, it's an ancient version, but still) bar() will be 6 insns long with the loop itself being 3 while foo() compiles to 7 insns of which 4 is the loop. In fact, the compiler is clever enough to transform bar()'s loop from for ( x = 201 ; --x ; ) func(); to x = 200; do func() while ( --x ); internally, the latter form being shorter to evaluate and since x is not used other than as the loop counter it doesn't matter. However, it is not clever enough to figure out that foo()'s loop is doing exactly what bar()'s is doing. Since x is only the loop counter, gcc could transform foo()'s loop to bar()'s freely but it doesn't. It generates the equivalent of this: x = 0; do { x += 1; func(); } while ( x != 240 ); that is not as efficient as what it generates from bar()'s code. Of course you get surprised when you change -O3 to -Os, in which case gcc suddenly realises that foo() can indeed be transformed to the internal representation that it used for bar() with -O3. Thus, we have foo() now being only 6 insns long with a 3 insn loop. Unfortunately, bar() is not that lucky. Although it's loop remains 3 insns long, the entire function is increased by an additional instruction, for bar() internally now looks like this: x = 201; goto label; do { func(); label: } while ( --x ); You can play with gcc and see which one of the equivalent C constructs it compiles to better code with any particular -O level (and if you have to work with severely constrained embedded systems you often do) but then hand-crafting your C code to fit gcc's taste is actually not that good an idea. With the next release, when different constructs will be recognised, you may end up with larger and/or slower code (as it happened to me when changing 4.0.x -> 4.3.x and before when going from 2.9.x to 3.1.x). Gcc will be the best optimising compiler when it will generate faster/shorter code that the other compilers on the majority of a large set of arbitrary, *not* hand-optimised sources. Preferrably for most targets, not only for the x86, if possible :-) Zoltan