Re: proposal to make SIZE_TYPE more flexible
Ping? Or do I need to repost on the patches list? http://gcc.gnu.org/ml/gcc/2014-01/msg00130.html
Re: proposal to make SIZE_TYPE more flexible
> Repost on the patches list (with self-contained write-up, rationale for > choices made, etc.) at the start of stage 1 for 4.10/5.0, Ok. > I suggest (this clearly isn't stage 3 material). Yup. Would be nice to back port it to 4.9 later, but... understood.
Re: MSP430 in gcc4.9 ... enable interrupts?
The constructs in the *.md files are for the compiler's internal use (i.e. there are function attributes that trigger those). You don't need compiler support for these opcodes at the user level; the right way is to implement those builtins as inline assembler in a common header file: static inline __attribute__((always_inline)) void __nop() { asm volatile ("NOP"); } static inline __attribute__((always_inline)) void __eint() { asm volatile ("EINT"); } Or more simply: #define __eint() asm("EINT") #define __nop() asm("NOP") For opcodes with parameters, you use a more complex form of inline assembler: static inline __attribute__((always_inline)) void BIC_SR(const int x) { asm volatile ("BIC.W %0,R2" :: "i" (x)); }
Re: MSP430 in gcc4.9 ... enable interrupts?
> I presume these will be part of the headers for the library > distributed for msp430 gcc by TI/Redhat? I can't speak for TI's or Red Hat's plans. GNU's typical non-custom embedded runtime is newlib/libgloss, which usually doesn't have that much in the way of chip-specific headers or library functions. > is that for the "critical" attribute that exists in the old msp430 > port (which disables interrupts for the duration of the function)? Yes, for things like that. They're documented under "Function Attributes" in the "Extensions to the C Language Family" chapter of the current GCC manual.
Re: [RL78] Questions about code-generation
> I've managed to build GCC myself so that I could experiment a bit > but as this is my first foray into compiler internals, I'm > struggling to work out how things fit together and what affects > what. The key thing to know about the RL78 backend, is that it has two "targets" it uses. For the first part of the compilation, up until after reload, the model uses 16 virtual registers (R8 through R15) and a virtual machine to give gcc an orthogonal model that it can generate code for. After reload, there's a "devirtualization" pass in the RL78 backend that maps the virtual model to the real model (R0 through R7), which means copying values in and out of the real registers according to which addressing modes are needed. Then GCC continues optimizing, which gets rid of most of the unneeded instructions. The problem you're probably running into is that deciding which real registers to use for each virtual one is a very tricky task, and the post-reload optimizers aren't expecing the code to look like what it does. > What causes that code to be generated when using a variable instead > of a fixed memory address? The use of "volatile" disables many of GCC's optimizations. I consider this a bug in GCC, but at the moment it needs to be "fixed" in the backends on a case-by-case basis.
Re: [RL78] Questions about code-generation
> Ah, that certainly explains a lot. How exactly would the fixing be > done? Is there an example I could look at for one of the other processors? No, RL78 is the first that uses this scheme. > I calculated a week or two ago that we could make a code-saving of > around 8% by using near or relative branches and near calls instead of > always generating far calls. I changed rl78-real.md to use near > addressing and got about 5%. That's probably about right. I tried to > generate relative branches too but I'm guessing that the 'length' > attribute needs to be set for all instructions to get that working properly. Or the linker could be taught to optimize branches once it knows the full displacement, but that can be even trickier to get right.
Re: [RL78] Questions about code-generation
> I'm curious. Have you tried out other approaches before you decided > to go with the virtual registers? Yes. Getting GCC to understand the "unusual" addressing modes the RL78 uses was too much for the register allocator to handle. Even when the addressing modes are limited to "usual" ones, GCC doesn't have a good way to do regalloc and reload when there are limits on what registers you can use in an address expression, and it's worse when there are dependencies between operands, or limited numbers of address registers.
Re: Legitimize address after reload
David Guillen writes: > In any case I'm not using the restrict variable and I'm assuming > strict is zero, this is, not checking the hard regsiters themselves. > This is because any reg is OK for base reg. I'm pretty sure I'm > behaving similarly to arm, cris or x86 backends. "strict" doesn't mean which hard register it is, "strict" means whether or not it's a hard register at all. If "strict" is true, you must assume any REG which isn't a real hard register (i.e. REGNO >= FIRST_PSEUDO_REGISTER) does NOT match.
Re: [RL78] Questions about code-generation
> Maybe we should add a target hook/macro to control this to avoid > duplicated code of 'general_operand' in various places? Even in the msp430, not all patterns can be safely used with volatile MEMs (i.e. the macro patterns). So, not all general_operand's were replaced.
Re: [RL78] Questions about code-generation
This is similar to what I had to do for msp430 - I made a new constraint that was what general_operand would have done if it allowed volatile MEMs, and used that for instructions where a volatile's volatileness wouldn't be broken.
Re: [RL78] Questions about code-generation
> Is it possible that the virtual pass causes inefficiencies in some > cases by sticking with r8-r31 when one of the 'normal' registers > would be better? That's not a fair question to ask, since the virtual pass can *only* use r8-r31. The first bank has to be left alone else the devirtualizer becomes a few orders of magnitude harder, if not impossible, to make work correctly. > In some cases, the normal optimization steps remove a lot, if not all, > of the unnecessary register passing, but not always. I've found that "removing uneeded moves through registers" is something gcc does poorly in the post-reload optimizers. I've written my own on some occasions (for rl78 too). Perhaps this is a good starting point to look at? > much needless copying, which strengthens my suspicion that it's > something in the RL78 backend that needs 'tweaking'. Of course it is, I've said that before I think. The RL78 uses a virtual model until reload, then converts each virtual instructions into multiple real instructions, then optimizes the result. This is going to be worse than if the real model had been used throughout (like arm or x86), but in this case, the real model *can't* be used throughout, because gcc can't understand it well enough to get through regalloc and reload. The RL78 is just to "weird" to be modelled as-is. I keep hoping that gcc's own post-reload optimizers would do a better job, though. Combine should be able to combine, for example, the "mov r8,ax; cmp r8,#4" types of insns together.
Re: Testing machine descriptions
I've thought about making a dejagnu testsuite specifically for helping with new ports, which would mean lots of md-specific tests, but really, the main testsuite probably covers everything you'd need to test. All patches are supposed to be regression tested anyway, which means running the full dejagnu testsuite before and after your change, to make sure you didn't break anything.
Re: Testing machine descriptions
The main testsuite doesn't have tests specifically to cover all the md entries. What I meant was, I suspect it covers enough plain C test cases to happen to use all the usual md entries. Since each target has different md entries (both "which are used" and "how each is used"), it would be nearly impossible to write an md testsuite. If you have target-specific patterns you want to test, you'd have to add a target-specific testsuite for them.
Re: Testing machine descriptions
Is there some way to get insight into which alternatives get used during a coverage run?
Re: RL78 sim?
> So far I've been testing with hardware but I'm pretty sure I read > somewhere about an RL78 simulator, which would be a useful addition. > Does this simulator exist, and if so, how do I run the tests against it? The simulator is part of the GDB build. > I tried 'make -k check RUNTESTFLAGS="--target_board=rl78-sim"' but in > amongst the errors I see 'ERROR: couldn't load description file for > rl78-sim', either it has a different name or I'm missing something on my > system (and a quick search didn't seem to find anything but I don't > really know what I'm looking for). You'll need something like this in your local ${DEJAGNU} file: { "rl78*-*" } { set boards_dir "/home/dj/dejagnu/baseboards" set target_list { rl78-sim } } Here's my rl78-sim.exp for dejagnu (it goes in whatever directory you specified above): # This is a list of toolchains that are supported on this board. set_board_info target_install {rl78-elf} # Load the generic configuration for this board. This will define a basic set # of routines needed by the tool to communicate with the board. load_generic_config "sim" # basic-sim.exp is a basic description for the standard Cygnus simulator. load_base_board_description "basic-sim" # "rl78" is the name of the sim subdir. setup_sim rl78 # No multilib options needed by default. process_multilib_options "" # We only support newlib on this target. We assume that all multilib # options have been specified before we get here. set_board_info compiler "[find_gcc]" set_board_info cflags"[libgloss_include_flags] [newlib_include_flags] -msim" set_board_info ldflags "[libgloss_link_flags] [newlib_link_flags]" # Doesn't pass arguments or signals, can't return results, and doesn't # do inferiorio. set_board_info noargs 1 set_board_info gdb,nosignals 1 set_board_info gdb,noresults 1 set_board_info gdb,noinferiorio 1 # Limit the stack size to something real tiny. set_board_info gcc,stack_size 4096 set_board_info gcc,timeout 300
stack-protection vs alloca vs dwarf2
While debugging some gdb-related FAILs, I discovered that gcc's -fstack-check option effectively calls alloca() to adjust the stack pointer. However, it doesn't mark the stack adjustment as FRAME_RELATED even when it's setting up the local variables for the function. In the case of rx-elf, for this testcase, the CFA for the function is defined in terms of the stack pointer - and thus is incorrect after the alloca call. My question is: who's fault is this? Should alloca() tell the debug stuff that the stack pointer has changed? Should it tell it to not use $sp at all? Should the debug stuff "just know" that $sp isn't a valid choice for the CFA? The testcase from gdb is pretty simple: void medium_frame () { char S [16384]; small_frame (); }
Re: stack-protection vs alloca vs dwarf2
> Presumably the rx back-end and more precisely TARGET_FRAME_POINTER_REQUIRED, > which needs to return true if cfun->calls_alloca. The rx back-end doesn't define TARGET_FRAME_POINTER_REQUIRED, as the documentation says the compiler handles target-independent reasons why there needs to be a frame pointer. But, the default TARGET_FRAME_POINTER_REQUIRED just returns false - shouldn't it, by default, check for calls_alloca ? Also, I added that hook and set it to return true always, and it didn't fix the bug. There is a frame pointer (there was before, too), but there's also a stack adjustment after the pseudo-alloca which the dwarf2 stuff doesn't know about. The last stack adjustment it sees is the rx backend's adjustment to allocate the frame: _medium_frame: pushm r6-r12 add #-4, r0, r6 ; marked frame-related (fp = sp - 4) mov.L r6, r0 ; marked frame-related (sp = fp) . . .; stack checking code goes here add #0xc000, r0 ; not marked frame-related <_medium_frame>: 0: 6e 6c pushm r6-r12 2: 71 06 fcadd #-4, r0, r6 5: ef 60 mov.l r6, r0 7: 2e: 72 00 00 c0 add #0xc000, r0, r0 0014 0030 FDE cie= pc=..0043 DW_CFA_advance_loc4: 2 to 0002 DW_CFA_def_cfa_offset: 32 DW_CFA_offset: r12 at cfa-8 . . . DW_CFA_offset: r6 at cfa-32 DW_CFA_advance_loc4: 3 to 0005 DW_CFA_def_cfa: r6 ofs 36 DW_CFA_advance_loc4: 2 to 0007 DW_CFA_def_cfa_register: r0 ( that's it for debug info ) Perhaps the stack-check code should set FRAME_RELATED on any stack adjustment insn?
Re: stack-protection vs alloca vs dwarf2
> I gather that r0 is the stack pointer and r6 the frame pointer? Yes. > > 0014 0030 FDE cie= pc=..0043 > > DW_CFA_advance_loc4: 2 to 0002 > > DW_CFA_def_cfa_offset: 32 > > DW_CFA_offset: r12 at cfa-8 > > . . . > > DW_CFA_offset: r6 at cfa-32 > > DW_CFA_advance_loc4: 3 to 0005 > > DW_CFA_def_cfa: r6 ofs 36 > > DW_CFA_advance_loc4: 2 to 0007 > > DW_CFA_def_cfa_register: r0 > > ( that's it for debug info ) > > If so, the above DW_CFA_def_cfa_register doesn't make sense, it should be r6 > once the frame is established. What does the CIE contain exactly? 0010 CIE Version: 3 Augmentation: "" Code alignment factor: 1 Data alignment factor: -4 Return address column: 17 DW_CFA_def_cfa: r0 ofs 4 DW_CFA_offset: r17 at cfa-4 DW_CFA_nop DW_CFA_nop 0014 0030 FDE cie= pc=..0043 DW_CFA_advance_loc4: 2 to 0002 DW_CFA_def_cfa_offset: 32 DW_CFA_offset: r12 at cfa-8 DW_CFA_offset: r11 at cfa-12 DW_CFA_offset: r10 at cfa-16 DW_CFA_offset: r9 at cfa-20 DW_CFA_offset: r8 at cfa-24 DW_CFA_offset: r7 at cfa-28 DW_CFA_offset: r6 at cfa-32 DW_CFA_advance_loc4: 3 to 0005 DW_CFA_def_cfa: r6 ofs 36 DW_CFA_advance_loc4: 2 to 0007 DW_CFA_def_cfa_register: r0 > > Perhaps the stack-check code should set FRAME_RELATED on any stack > > adjustment insn? > > No, the design is that stack checking or alloca force the use of the frame > pointer, which thus becomes the CFA register, which means that subsequent > stack adjustments are irrelevant for the CFI. Does the backend have to *not* mark further changes to the stack pointer in the prologue as frame related, if the function calls alloca? This is the RL expand_prologue() is emitting: (insn/f 42 5 43 2 (parallel [ (set/f (reg/f:SI 0 r0) (minus:SI (reg/f:SI 0 r0) (const_int 28 [0x1c]))) (set/f (mem:SI (minus:SI (reg/f:SI 0 r0) (const_int 4 [0x4])) [0 S4 A8]) (reg:SI 12 r12)) (set/f (mem:SI (minus:SI (reg/f:SI 0 r0) (const_int 8 [0x8])) [0 S4 A8]) (reg:SI 11 r11)) (set/f (mem:SI (minus:SI (reg/f:SI 0 r0) (const_int 12 [0xc])) [0 S4 A8]) (reg:SI 10 r10)) (set/f (mem:SI (minus:SI (reg/f:SI 0 r0) (const_int 16 [0x10])) [0 S4 A8]) (reg:SI 9 r9)) (set/f (mem:SI (minus:SI (reg/f:SI 0 r0) (const_int 20 [0x14])) [0 S4 A8]) (reg:SI 8 r8)) (set/f (mem:SI (minus:SI (reg/f:SI 0 r0) (const_int 24 [0x18])) [0 S4 A8]) (reg/f:SI 7 r7)) (set/f (mem:SI (minus:SI (reg/f:SI 0 r0) (const_int 28 [0x1c])) [0 S4 A8]) (reg/f:SI 6 r6)) ]) dj.c:2 -1 (nil)) (insn/f 43 42 44 2 (parallel [ (set (reg/f:SI 6 r6) (plus:SI (reg/f:SI 0 r0) (const_int -4 [0xfffc]))) (clobber (reg:CC 16 cc)) ]) dj.c:2 -1 (nil)) (insn/f 44 43 45 2 (set (reg/f:SI 0 r0) (reg/f:SI 6 r6)) dj.c:2 -1 (nil))
Re: stack-protection vs alloca vs dwarf2
> The "mov.L r6, r0" instruction must never be marked as frame-related, for any > function. Is this documented somewhere?
Re: stack-protection vs alloca vs dwarf2
> The "mov.L r6, r0" instruction must never be marked as frame-related, for any > function. Also, is that rule true if we *don't* have a frame pointer? That is, when we add a constant to the stack to allocate the frame, should that function be marked as frame-related? Or is it just the fp->sp move (or potentially an add, if there's outgoing args) that shouldn't be marked?
question about GTY macro
Given this in tree.h: struct int_n_trees_t { tree signed_type; tree unsigned_type; }; extern struct int_n_trees_t int_n_trees[NUM_INT_N_ENTS]; And this in tree.c: struct int_n_trees_t int_n_trees [NUM_INT_N_ENTS]; What is the right way to mark these for garbage collection? I can't seem to get int_n_trees[] to show up in any of the gc-related generated files. I need the int_n_trees[] trees to be locked into memory, but I see signs that they're being reclaimed instead.
Re: question about GTY macro
> Likewise. See how global_trees is marked for example. But likely > you forgot to mark struct int_n_trees_t to be considered for GC. I did remember int_n_trees_t, but it seems to be working now, so who knows what I was doing wrong :-P Thanks!
Re: reverse bitfield patch
Revisiting an old thread, as I still want to get this feature in... https://gcc.gnu.org/ml/gcc/2012-10/msg00099.html > >> Why do you need to change varasm.c at all? The hunks seem to be > >> completely separate of the attribute. > > > > Because static constructors have fields in the original order, not the > > reversed order. Otherwise code like this is miscompiled: > > Err - the struct also has fields in the original order - only the bit > positions > of the fields are different because of the layouting option. The order of the field decls in the type (stor-layout.c) is not changed, only the bit position information. The order here *can't* be changed, because the C language assumes that parameters, initializers, etc are presented in the same order as the original declaration, regardless of the target-specific layout. When the program includes an initializer: > > struct foo a = { 1, 2, 3 }; The order of 1, 2, and 3 need to correspond to the order of the bitfields in 'a', so we can change neither the order of the bitfields in 'a' nor the order of constructor fields. However, when we stream the initializer out to the .S file, we need to pack the bitfields in the right sequence to generate the right bit patterns in the final output image. The code in varasm.c exists to make sure that the initializers for bitfields are written/packed in the correct order, to correspond to the bitfield positions. I.e. the 1,2,3 initializer needs to be written to the .S file as either 0x0123 or 0x3210 depending on the bit positions. In neither case do we change the order of the fields in the type itself, i.e. the array/chain order. > And you expect no other code looks at fields of a structure and its > initializer? It's bad to keep this not in-sync. Thus I don't think it's > viable to re-order fields just because bit allocation is reversed. The fields are in sync. The varasm.c change sorts the elements as they're being output into the byte stream in the .S, it doesn't sort the field definitions themselves. > > + /* If the bitfield-order attribute has been used on this > > +structure, the fields might not be in bit-order. In that > > +case, we need a separate representative for each > > +field. */ > > The typical use-case for this feature is memory-mapped hardware, where > > pessimum access is preferred anyway. > > I doubt that, looking at constraints for strict volatile bitfields. The code that handles representatives requires (via an assert, IIRC) that the bit offsets within a representative be in ascending order. I.e. gcc ICEs if I don't bypass this. In the case of volatile bitfields, which would be the typical use case for a reversed bitfield, the access mode is going to match the type size regardless, so performance is not changed by this patch.
Re: reverse bitfield patch
> Ok, but as we are dealing exclusively with bitfields there is > already output_constructor_bitfield which uses an intermediate > state to "pack" bits into units that are then emitted. It shouldn't > be hard to change that to make it pack into the appropriate bits > instead. That assumes that the output unit is only emitted once per string of bitfields. If the total amount of data to output is larger than the unit size, then the units themselves need to be output in the other order also. > Note that code expects that representatives are byte-aligned so better > would be to not assign representatives or make the code work with > the swapped layout (I see no reason why that shouldn't work - maybe > it works doing before swapping the layout)? I'm OK with not assigning them, but I couldn't figure out from the code what they were for. > I'm still not happy about the idea in general (why is this a bitfield > exclusive thing? If a piece of HW is big/little-endian then even > regular fields would have that property. A bi-endian MCU with memory-mapped peripherals needs this to properly and portably describe the fields within the peripheral's registers. Without this patch, there's no way (short of two independent definitions) of assigning a name to, for example, the LSB of such a device's registers. > Your patch comes with no testcase - testcases should cover all > attribute variants, multiple bitfield (group) sizes and mixed > initializations / reads / writes and be best execute testcases. I wrote testcases, perhaps I just forgot to attach them.
Re: m32c-*-* Build Issue (Multilib?)
I just tried a 4.9.1 build and got this error: configure:4222: checking whether to use setjmp/longjmp exceptions configure:: /greed/dj/gnu/gcc/m32c-elf/gcc-4_9-branch/./gcc/xgcc -B/greed/dj/gnu/gcc/m32c-elf/gcc-4_9-branch/./gcc/ -B/greed/dj/m32c/install/m32c-elf/bin/ -B/greed/dj/m32c/install/m32c-elf/lib/ -isystem /greed/dj/m32c/install/m32c-elf/include -isystem /greed/dj/m32c/install/m32c-elf/sys-include -mcpu=m32cm -c --save-temps -fexceptions conftest.c >&5 conftest.c: In function 'foo': conftest.c:19:1: error: insn does not satisfy its constraints: } ^ (insn 52 38 23 (set (reg:SI 2 r1 [29]) (reg:SI 4 a0)) 99 {movsi_24} (nil)) conftest.c:19:1: internal compiler error: in final_scan_insn, at final.c:2891 0x7a56a8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /greed/dj/gnu/gcc/svn/gcc-4_9-branch/gcc/rtl-error.c:109 0x7a56cf _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /greed/dj/gnu/gcc/svn/gcc-4_9-branch/gcc/rtl-error.c:120 0x6256c9 final_scan_insn(rtx_def*, _IO_FILE*, int, int, int*) /greed/dj/gnu/gcc/svn/gcc-4_9-branch/gcc/final.c:2891 0x6258ef final(rtx_def*, _IO_FILE*, int) /greed/dj/gnu/gcc/svn/gcc-4_9-branch/gcc/final.c:2023 0x626035 rest_of_handle_final /greed/dj/gnu/gcc/svn/gcc-4_9-branch/gcc/final.c:4427 0x626035 execute /greed/dj/gnu/gcc/svn/gcc-4_9-branch/gcc/final.c:4502
Re: m32c-*-* Build Issue (Multilib?)
> We see other failures in the log because newlib/targ-include > isn't created. The rtems build include path includes that and > needs it but it isn't created before libgcc is built. That isn't a > problem on other targets. I don't see anything odd in the top > configurery magic for m32c which could cause this but I could > easily be missing something. If you're building in separate trees, you need to build gcc-host, then newlib, then gcc-target. If you're building in a combined tree, I don't know.
Re: m32c-*-* Build Issue (Multilib?)
> What's the next step? Someone finds time and desire to debug it ;-)
push_rounding vs memcpy vs stack_pointer_delta
The m32c-elf with -mcpu=m32c has a word-aligned stack and uses pushes for arguments (i.e. not accumulate_outgoing_args). In this test case, one of the arguments is memcpy'd into place, and an assert fails: typedef struct { int a, b, c, d, e, f, g, h; } foo; int x; void dj (int a, int b, foo c) { dj2 (x, a, b, c); } if (pass == 0) { . . . } else { normal_call_insns = insns; /* Verify that we've deallocated all the stack we used. */ gcc_assert ((flags & ECF_NORETURN) || (old_stack_allocated == stack_pointer_delta - pending_stack_adjust)); } After much debugging, it turns out that the argument that's memcpy'd to stack doesn't adjust stack_pointer_delta the same way that the other arguments do (i.e. push_block and push_args don't adjust it consistently, or something like that.) I came up with this patch: Index: expr.c === --- expr.c (revision 214599) +++ expr.c (working copy) @@ -4234,12 +4234,16 @@ emit_push_insn (rtx x, enum machine_mode /* Get the address of the stack space. In this case, we do not deal with EXTRA separately. A single stack adjust will do. */ if (! args_addr) { temp = push_block (size, extra, where_pad == downward); +#ifdef PUSH_ROUNDING + if (CONST_INT_P (size)) + stack_pointer_delta += INTVAL (size) + extra; +#endif extra = 0; } else if (CONST_INT_P (args_so_far)) temp = memory_address (BLKmode, plus_constant (Pmode, args_addr, skip + INTVAL (args_so_far))); But builds of libstdc++v3 demonstrate that sometimes stack_pointer_delta *is* adjusted consistently, thus there must be some more complex logic for determining when the extra adjustment (i.e. my patch) should be made, or it's in the wrong place. So... could someone more familiar with this code enlighten me on the actual rules for what stack_pointer_delta means and when it gets adjusted, and where (in theory) a memcpy'd argument on a push_args target should update it (if at all)? Thanks! DJ
Re: libgcc - SJLJ probe failing on head on h8300 & m32c
Last time you mentioned this, I asked what the contents of that config.log were...
pointer math vs named address spaces
If a target (rl78-elf in my case) has a named address space larger than the generic address space (__far in my case), why is pointer math in that named address space still truncated to sizetype? N1275 recognizes that named address spaces might be a different size than the generic address space, but I didn't see anything that required such truncation. volatile char __far * ptr1; volatile char __far * ptr2; uint32_t sival; foo() { ptr2 = ptr1 + sival; } foo () { volatile char * ptr2.5; sizetype D.2252; long unsigned int sival.4; volatile char * ptr1.3; sizetype _3; ;; basic block 2, loop depth 0 ;;pred: ENTRY ptr1.3_1 = ptr1; sival.4_2 = sival; _3 = (sizetype) sival.4_2;<--- why this truncation? ptr2.5_4 = ptr1.3_1 + _3; ptr2 = ptr2.5_4; return; ;;succ: EXIT }
Re: pointer math vs named address spaces
> However, pointer subtraction still returns ptrdiff_t, and sizeof still > returns size_t, Why?
Re: volatile access optimization (C++ / x86_64)
Matt Godbolt writes: > GCC's code generation uses a "load; add; store" for volatiles, instead > of a single "add 1, [metric]". GCC doesn't know if a target's load/add/store patterns are volatile-safe, so it must avoid them. There are a few targets that have been audited for volatile-safe-ness such that gcc *can* use the combined load/add/store when the backend says it's OK. x86 is not yet one of those targets. Also, note that the standard says the physical target must do the same operations that the "model" target does, but it does not require that these operations be in separate opcodes. A single opcode that performs the correct operations in the correct order complies with the standard; but you have to tell gcc which opcodes comply.
Re: volatile access optimization (C++ / x86_64)
> What is involved with the auditing? Each pattern that (directly or indirectly) uses general_operand, memory_operand, or nonimmediate_operand needs to be checked to see if it's volatile-safe. If so, you need to change the predicate to something that explicitly accepts volatiles. There's been talk about adding direct support for a "volatile-clean" flag that avoids this for targets where you know it's correct, which bypasses the volatile check in those functions, but it hasn't happened yet.
Re: volatile access optimization (C++ / x86_64)
> One question: do you have an example of a non-volatile-safe machine so > I can get a feel for the problems one might encounter? At best I can > imagine a machine that optimizes "add 0, [mem]" to avoid the > read/write, but I'm not aware of such an ISA. For example, the MSP430 backend uses a macro for movsi, addsipsi3, subpsi3, and a few others, which aren't volatile-safe. Look for "general_operand" vs "msp_general_operand".
Re: volatile access optimization (C++ / x86_64)
> I looked in the documentation and didn’t see this described. AFAIK it's not documented. Only recently was it agreed (and even then, reluctantly) that the ISO spec could be met by such opcodes.
Re: volatile access optimization (C++ / x86_64)
> To try to generalize from that: it looks like the operating > principle is that an insn that expands into multiple references to a > given operand isn’t volatile-safe, but one where there is only a > single reference is safe? No, if the expanded list of insns does "what the standard says, no more, no less" as far as memory accesses go, it's OK. Many of the MSP macros do not access memory in a volatile-safe way. Some do. If you have a single opcode that isn't volatile-safe (for example, a string operation that's interruptable and restartable), that wouldn't be OK despite being a single insn. So it's kinda mechanical, but not always.
Re: volatile access optimization (C++ / x86_64)
> Ok, but the converse — if the general_operand is accessed by more > than one instruction, it is not safe — is correct, right? In general, I'd agree, but the ISO spec talks about "sequence points" and there are times when you *can* access a volatile multiple times as long as the state is correct at the sequence point. GCC won't, for example, combine insns if it doesn't know if the combined insn follows the sequence point rules correctly. This is in 5.1.2.3 in the C99 spec but that caveat mostly applies to non-memory-mapped volatiles; memory-mapped ones are typically more strictly confined (6.7.3.6) (which is the origin of the -fstrict-volatile-bitfields patch). So, for example, if you had a volatile on the stack and a special stack-relative insn to modify it, you would "know" it would be safe to do so even if it doesn't meed 6.7.3.6. Or if you used atomics to guard a multi-access macro to make it volatile-safe.
Re: Rename C files to .c in GCC source
pins...@gmail.com writes: > No because they are c++ code so capital C is correct. However, we should avoid relying on case-sensitive file systems (Windows) and use .cc or .cxx for C++ files ("+" is not a valid file name character on Windows, so we can't use .c++).
Re: Rename C files to .c in GCC source
> Aren't current Windows file systems case-preserving? Then they > shouldn't have no problems with .C files. They are case preserving, but not case sensitive. A wildcard search for *.c will match foo.C and bar.c, and foo.c can be opened as FOO.C.
building against a temporary install dir?
So here's what I'm trying to do... I want to build gcc, binutils, and newlib, run tests, and IF the tests pass, THEN install them all. However, gcc needs an installed newlib to build it's libraries. I tried installing newlib into $DESTDIR$PREFIX but gcc ignores $DESTDIR during the compile. Any ideas on how to do this, short of building and installing everything (or at least gcc and newlib) twice?
Re: Newlib/Cygwin now under GIT
> This is a common problem. I guess newlib/cygwin got the oldest set > and, afaik, the GCC toplevel stuff is kind of the master. It would > be nice if we had some automatism in place to keep all former src > repos in sync. There was never any agreement on who the "master" was for toplevel sources - no repo was willing to give up control to the other, so no automatic mirroring was ever done, unlike the libiberty/include mirror, where src agreed to let gcc be the master. Also, for the record, I do not wish to, nor do I intend to, provide any automated merging services for git repos. I don't like git and I'd rather not use it if I don't have to.
s390: SImode pointers vs LR
In config/s390/s390.c we accept addresses that are SImode: if (!REG_P (base) || (GET_MODE (base) != SImode && GET_MODE (base) != Pmode)) return false; However, there doesn't seem to be anything in the s390's opcodes that masks the top half of address registers in 64-bit mode, the SImode convention seems to just be a convention for addresses in the first 4Gb. So... what happens if gcc uses a subreg to load the lower half of a register (via LR), leaving the upper half with random bits in it, then uses that register as an address? I could see no code that checked for this, and I have a fairly large and ungainly test case that says it breaks :-( My local solution was to just disallow "SImode" as an address in s390_decompose_address, which forces gcc to do an explicit SI->DI conversion to clear the upper bits, and it seems to work, but I wonder if it's the ideal solution...
dwarf DW_AT_decl_name: system headers vs source files?
Consider: # 1 "dj.c" # 1 "dj.h" 1 3 int dj(int x); # 2 "dj.c" 2 int dj(int x) { } If you compile with -g and look at the dwarf output, you see: <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram) <2e> DW_AT_external: 1 <2e> DW_AT_name: dj <31> DW_AT_decl_file : 2 <32> DW_AT_decl_line : 1 The File Name Table: Entry Dir TimeSizeName 1 0 0 0 dj.c 2 0 0 0 dj.h Note that the DW_AT_decl_file refers to "dj.h" and not "dj.c". If you remove the "3" from the '# 1 "dj.h" 1 3' line, the DW_AT_decl_file instead refers to "dj.c". It's been this way for many releases. Is this intentional? If so, what is the rationalization for it?
rl78 vs cse vs memory_address_addr_space
In this bit of code in explow.c: /* By passing constant addresses through registers we get a chance to cse them. */ if (! cse_not_expected && CONSTANT_P (x) && CONSTANT_ADDRESS_P (x)) x = force_reg (address_mode, x); On the rl78 it results in code that's a bit too complex for later passes to be optimized fully. Is there any way to indicate that the above force_reg() is bad for a particular target?
Re: rl78 vs cse vs memory_address_addr_space
Given a test case like this: typedef struct { unsigned char no0 :1; unsigned char no1 :1; unsigned char no2 :1; unsigned char no3 :1; unsigned char no4 :1; unsigned char no5 :1; unsigned char no6 :1; unsigned char no7 :1; } __BITS8; #define SFR0_bit (*(volatile union __BITS9 *)0x0) #define SFREN SFR0_bit.no4 foo() { SFREN = 1U; SFREN = 0U; } (i.e. any code that sets/clears one bit in a volatile memory-mapped area, which the rl78 has instructions for) Before: (insn 5 2 7 2 (set (reg/f:HI 43) (const_int 240 [0xf0])) test.c:24 7 {*movhi_virt} (nil)) (insn 7 5 8 2 (set (reg:QI 45 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16])) test.c:24 5 {movqi_virt} (nil)) (insn 8 7 9 2 (set (reg:QI 46) (ior:QI (reg:QI 45 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (const_int 16 [0x10]))) test.c:24 19 {*iorqi3_virt} (expr_list:REG_DEAD (reg:QI 45 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (nil))) (insn 9 8 12 2 (set (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (reg:QI 46)) test.c:24 5 {movqi_virt} (expr_list:REG_DEAD (reg:QI 46) (nil))) (insn 12 9 13 2 (set (reg:QI 49 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16])) test.c:26 5 {movqi_virt} (nil)) (insn 13 12 14 2 (set (reg:QI 50) (and:QI (reg:QI 49 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (const_int -17 [0xffef]))) test.c:26 18 {*andqi3_virt} (expr_list:REG_DEAD (reg:QI 49 [ MEM[(volatile union un_per0 *)240B].BIT.no4 ]) (nil))) (insn 14 13 0 2 (set (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (reg:QI 50)) test.c:26 5 {movqi_virt} (expr_list:REG_DEAD (reg:QI 50) (expr_list:REG_DEAD (reg/f:HI 43) (nil Combine gets as far as this: Trying 5 -> 9: Failed to match this instruction: (parallel [ (set (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (ior:QI (mem/v/j:QI (const_int 240 [0xf0]) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (const_int 16 [0x10]))) (set (reg/f:HI 43) (const_int 240 [0xf0])) ]) (the set is left behind because it's used for the second assignment) Both of those insns in the parallel are valid rl78 insns. I tried adding that parallel as a define-and-split but combine doesn't split it at the point where it inserts it, so it doesn't work right. If it reduced those four instructions to the two in the parallel, but without the parallel, it would probably work too. We end up with code like this: movwr8, #240 ; 5*movhi_real/4 [length = 4] movwax, r8 ; 19 *movhi_real/5 [length = 4] movwhl, ax ; 21 *movhi_real/6 [length = 4] set1[hl].4 ; 9*iorqi3_real/1 [length = 4] clr1[hl].4 ; 14 *andqi3_real/1 [length = 4] but what we want is this: set1!240.4 ; 9*iorqi3_real/1 [length = 4] clr1!240.4 ; 14 *andqi3_real/1 [length = 4] ( !240 means (mem (const_int 240)) ) (if there's only one such operation in a function, it combines properly, likely because the address is not needed after the insn it can combine, unlike the parallel above) The common addresses are separated at least before lowering to RTL; as the initial expansion has: ;; MEM[(volatile union un_per0 *)240B].BIT.no4 ={v} 1; (insn 5 4 7 (set (reg/f:HI 43) (const_int 240 [0xf0])) test.c:24 -1 (nil)) (insn 7 5 8 (set (reg:QI 45) (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16])) test.c:24 -1 (nil)) (insn 8 7 9 (set (reg:QI 46) (ior:QI (reg:QI 45) (const_int 16 [0x10]))) test.c:24 -1 (nil)) (insn 9 8 0 (set (mem/v/j:QI (reg/f:HI 43) [0 MEM[(volatile union un_per0 *)240B].BIT.no4+0 S1 A16]) (reg:QI 46)) test.c:24 -1 (nil)) Yes, I know gcc doesn't like combining volatile accesses into one insn, but the rl78 backend (my copy at least) has predicates that allow it, because it's safe on rl78. Also, if I take out the "volatile" yet put some sort of barrier (like a volatile asm) between the two assignments, it still fails, in the same manner.
Re: rl78 vs cse vs memory_address_addr_space
> Did you try just a define_split instead? Ugly, but it should work I think. It doesn't seem to be able to match a define_split :-(
s390: larl for Simode on 64-bit
Is there any reason that LARL can't be used to load a 32-bit symbolic value, in 64-bit mode? On TPF (64-bit) the app has the option of being loaded in the first 4Gb so that all symbols are also valid 32-bit addresses, for backward compatibility. (and if not, the linker would complain) Index: s390.md === --- s390.md (revision 225579) +++ s390.md (working copy) @@ -1845,13 +1845,13 @@ emit_symbolic_move (operands); }) (define_insn "*movsi_larl" [(set (match_operand:SI 0 "register_operand" "=d") (match_operand:SI 1 "larl_operand" "X"))] - "!TARGET_64BIT && TARGET_CPU_ZARCH + "TARGET_CPU_ZARCH && !FP_REG_P (operands[0])" "larl\t%0,%1" [(set_attr "op_type" "RIL") (set_attr "type""larl") (set_attr "z10prop" "z10_fwd_A1")])
Re: s390: larl for Simode on 64-bit
In the TPF case, the software has to explicitly mark such pointers as SImode (such things happen only when structures that contain addresses can't change size, for backwards compatibility reasons[1]): int * __attribute__((mode(SImode))) ptr; ptr = &some_var; so I wouldn't consider this the "default" case for those apps, just *a* case that needs to be handled "well enough", and the user is already telling the compiler that they assume those addresses are 32-bit (that either the whole app, or at least the part with that object, will be linked below 4Gb). The majority of the addresses are handled as 64-bit. [1] /me refrains from commenting on the worth of such practices, just that they exist and need to be (and have been) supported.
Re: s390: larl for Simode on 64-bit
> So in effect, we have two pointer sizes, 64 being the default, but > we can also get a 32 bit pointer via the syntax above? Wow, I'm > surprised that works. Yup, been that way for many years. > And the only time we'd be able to use larl is a dereference of a > pointer declared with the syntax above. Right larl would be used to load the address of an object to *initialize* such a pointer, but yes. Regular pointers still use larl but as a DImode operation. I.e. larl will always load a 64-bit value into a register, even if gcc will only use the 32 LSBs. > OK for the trunk with a simple testcase. I think you can just scan > the assembler output for the larl instruction. Will do, but it's part of a bigger patch. I just wanted to make sure there wasn't some side-effect of larl that precluded this use.
Re: Question about "instruction merge" pass when optimizing for size
I've seen this on other targets too, sometimes so bad I write a quick target-specific "stupid move optimizer" pass to clean it up. A generic pass would be much harder, but very useful.
Re: Offer of help with move to git
> In the mean time, I'm enclosing a contributor map that will need to be > filled in whoever does the conversion. The right sides should become > full names and preferred email addresses. This information should be gleanable from the Changelog commits... do you have a script to scan those?
Re: Repository for the conversion machinery
Hmmm... I use two email addresses for commits, depending on which target they're for, i.e.: $ grep DJ MAINTAINERS m32c port DJ Delorie DJGPP DJ Delorie Most of the DJGPP stuff was long ago but I wonder how the conversion would handle this?
Re: Repository for the conversion machinery
> If you want your commits to be attributed to two different addresses > in the git conversion, you need to tell me how to specify two > different selection sets so I can write assign statements and two > trivial "authors read" commands affecting them only. > > assuming that the names m32c and djgpp have been properly bound. Since I have no idea what you mean by "properly bound", or even where these names come from, I don't know how to define rules based on them. The only realiable way I can think of is to look at which address I used in the ChangeLog entry that is part of each commit. After all, that's what ChangeLog entries are for.
reload question about unmet constraints
Given this test case for rl78-elf: extern __far int a, b; void ffr (int x) { a = b + x; } I'm trying to use this patch: Index: gcc/config/rl78/rl78-virt.md === --- gcc/config/rl78/rl78-virt.md (revision 227360) +++ gcc/config/rl78/rl78-virt.md(working copy) @@ -92,15 +92,15 @@ ] "rl78_virt_insns_ok ()" "v.inc\t%0, %1, %2" ) (define_insn "*add3_virt" - [(set (match_operand:QHI 0 "rl78_nonfar_nonimm_operand" "=vY,S") - (plus:QHI (match_operand:QHI 1 "rl78_nonfar_operand" "viY,0") - (match_operand:QHI 2 "rl78_general_operand" "vim,i"))) + [(set (match_operand:QHI 0 "rl78_nonimmediate_operand" "=vY,S,Wfr") + (plus:QHI (match_operand:QHI 1 "rl78_general_operand" "viY,0,0") + (match_operand:QHI 2 "rl78_general_operand" "vim,i,vi"))) ] "rl78_virt_insns_ok ()" "v.add\t%0, %1, %2" ) (define_insn "*sub3_virt" To allow the rl78 port to generate the "Wfr/0/r" case (alternative 3). (Wfr = far MEM, v = virtual regs). I expected gcc to see that the operation doesn't meet the constraints, and move operands into registers to make it work (alternative 1, "v/v/v"). Instead, it just complains and dies. dj.c:42:1: error: insn does not satisfy its constraints: } ^ (insn 10 15 13 2 (set (mem/c:HI (reg:SI 8 r8) [1 a+0 S2 A16 AS2]) (plus:HI (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [1 x+0 S2 A16]) (mem/c:HI (symbol_ref:SI ("b") ) [1 b+0 S2 A16 AS2]))) dj.c:41 13 {*addhi3_virt} (nil)) dj.c:42:1: internal compiler error: in extract_constrain_insn, at recog.c:2200 Reloads for insn # 10 Reload 0: reload_in (SI) = (symbol_ref:SI ("a") ) V_REGS, RELOAD_FOR_INPUT (opnum = 0), inc by 2 reload_in_reg: (symbol_ref:SI ("a") ) reload_reg_rtx: (reg:SI 8 r8) Reload 1: reload_in (HI) = (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [2 x+0 S2 A16]) reload_out (HI) = (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [2 x+0 S2 A16]) V_REGS, RELOAD_OTHER (opnum = 1), optional reload_in_reg: (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [2 x+0 S2 A16]) reload_out_reg: (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [2 x+0 S2 A16]) Reload 2: reload_in (HI) = (mem/c:HI (symbol_ref:SI ("b") ) [2 b+0 S2 A16 AS2]) V_REGS, RELOAD_FOR_INPUT (opnum = 2), optional reload_in_reg: (mem/c:HI (symbol_ref:SI ("b") ) [2 b+0 S2 A16 AS2]) So this is where I've been banging my head against the sources... where is the magic that tells gcc to try to copy everything into registers to meet the constraints? Note: expand is ok, the initial add insn is: (insn 9 3 10 2 (set (reg:HI 48 [ D.1375 ]) (plus:HI (reg/v:HI 45 [ x ]) (mem/c:HI (symbol_ref:SI ("b") ) [2 b+0 S2 A16 AS2]))) dj.c:43 -1 (nil)) and just before reload: (insn 10 9 0 2 (set (mem/c:HI (symbol_ref:SI ("a") ) [2 a+0 S2 A16 AS2]) (plus:HI (mem/c:HI (reg/f:HI 33 ap) [2 x+0 S2 A16]) (mem/c:HI (symbol_ref:SI ("b") ) [2 b+0 S2 A16 AS2]))) dj.c:43 13 {*addhi3_virt} (nil))
Re: reload question about unmet constraints
> It did match the first alternative (alternative 0), but it matched the > constraints Y/Y/m. It shouldn't match Y as those are for near addresses (unless it's only matching MEM==MEM), and the ones in the insn are far, but ... > Reload doesn't have any concept of two different kinds of memory > operands which can't be converted via reloads. If the constraint > accepts mem, and we have a mem operand, then it will always assume > that the problem is with the address and reload it. ... this sounds like it could be a problem for me :-P
Re: reload question about unmet constraints
> You would need some way to indicate that while Y does accept a mem, > this particular mem can't be reloaded to match. We don't have a way > to do that. As a test, I added this API. It seems to work. I suppose there could be a better API where we determine if a constrain matches various memory spaces, then compare with the memory space of the operand, but I can't prove that's sufficiently flexible for all targets that support memory spaces. Heck, I'm not even sure what to call the macro, and "TARGET_IS_THIS_MEMORY_ADDRESS_RELOADABLE_TO_MATCH_THIS_CONTRAINT_P()" is a little long ;-) What do we think of this direction? Index: reload.c === RCS file: /cvs/cvsfiles/gnupro/gcc/reload.c,v retrieving revision 1.33 diff -p -U 5 -r1.33 reload.c --- reload.c20 Feb 2014 16:40:26 - 1.33 +++ reload.c15 Sep 2015 05:38:24 - @@ -3517,20 +3517,26 @@ find_reloads (rtx insn, int replace, int && ((reg_equiv_mem (REGNO (operand)) != 0 && EXTRA_CONSTRAINT_STR (reg_equiv_mem (REGNO (operand)), c, p)) || (reg_equiv_address (REGNO (operand)) != 0))) win = 1; +#ifndef COMPATIBLE_CONSTRAINT_P +#define COMPATIBLE_CONSTRAINT_P(c,p,op) 1 +#endif + if (!MEM_P (operand) || COMPATIBLE_CONSTRAINT_P (c, p, operand)) + { /* If we didn't already win, we can reload constants via force_const_mem, and other MEMs by reloading the address like for 'o'. */ if (CONST_POOL_OK_P (operand_mode[i], operand) || MEM_P (operand)) badop = 0; constmemok = 1; offmemok = 1; break; } + } if (EXTRA_ADDRESS_CONSTRAINT (c, p)) { if (EXTRA_CONSTRAINT_STR (operand, c, p)) win = 1; Index: config/rl78/rl78.c === RCS file: /cvs/cvsfiles/gnupro/gcc/config/rl78/rl78.c,v retrieving revision 1.12.6.16 diff -p -U 5 -r1.12.6.16 rl78.c --- config/rl78/rl78.c 5 Aug 2015 13:43:59 - 1.12.6.16 +++ config/rl78/rl78.c 15 Sep 2015 05:39:04 - @@ -1041,10 +1041,18 @@ rl78_far_p (rtx x) return 0; return GET_MODE_BITSIZE (rl78_addr_space_address_mode (MEM_ADDR_SPACE (x))) == 32; } +int +rl78_compatible_constraint_p (char c, const char *p, rtx r) +{ + if (c == 'Y' && rl78_far_p (r)) +return 0; + return 1; +} + /* Return the appropriate mode for a named address pointer. */ #undef TARGET_ADDR_SPACE_POINTER_MODE #define TARGET_ADDR_SPACE_POINTER_MODE rl78_addr_space_pointer_mode static enum machine_mode Index: config/rl78/rl78.h === RCS file: /cvs/cvsfiles/gnupro/gcc/config/rl78/rl78.h,v retrieving revision 1.7.8.3 diff -p -U 5 -r1.7.8.3 rl78.h --- config/rl78/rl78.h 17 Mar 2015 14:54:35 - 1.7.8.3 +++ config/rl78/rl78.h 15 Sep 2015 05:39:28 - @@ -500,5 +500,7 @@ typedef unsigned int CUMULATIVE_ARGS; /* NOTE: defined but zero means dwarf2 debugging, but sjlj EH. */ #define DWARF2_UNWIND_INFO 0 #define REGISTER_TARGET_PRAGMAS() rl78_register_pragmas() + +#define COMPATIBLE_CONSTRAINT_P(C,P,OP) rl78_compatible_constraint_p (C,P,OP)
Re: reload question about unmet constraints
> I see. Is it correct then to say that reload will never be able to > change a near mem into a far mem or vice versa? If that is true, there > doesn't appear to be any real benefit to having both near and far mem > operations as *alternatives* to the same insn pattern. The RL78 has a segment register, much like the x86. The segment register allows you to have a 20-bit address instead of a 16-bit address. However, due to details of the port, you can only have *one* segment register override per operation, even if it applies to more than one (identical) operand. So, you can add two near pointers, but you can't add two different far pointers, but you can add something to a far pointer (i.e. x += 5). > In that case, you might be able to fix the bug by splitting the > offending insns into two patterns, one only handling near mems > and one handling one far mems, where the near/far-ness of the mem > is verified by the *predicate* and not the constraints. But this means that when reload needs to, it moves far mems into registers, which changes which insn is matched... It also adds a *lot* of new patterns, since any of the three operands can be far, and '0' constraints on far are allowed also - and most insns allow far this way, so could be up to seven times as many patterns. You can see why I'd rather not do that :-)
Re: reload question about unmet constraints
> And in fact, you should be able to decide at *expand* time which > of the two you need for the given set of operands. I already check for multiple fars at expand, and force all but one of them to registers. Somewhere before reload they get put back in. >"rl78_virt_insns_ok () && rl78_far_insn_p (operands)" Since when does this work reliably? I've seen cases where insns get mashed together without regard for validity before... I tested just this change - adding that function to addhi3 plus the Wfr constraint sets - and it seems to work. The big question to me now is - is this *supposed* to work this way? Or is is a coincidence that the relevent passes happen to check that function? > The Wfr constraint must not be marked as memory constraint (so as to > avoid reload attempting to use it to access a stack slot). This also prevents reload from reloading the address when it *is* needed. However, it seems to work ok even as a memory constraint. Is this change *just* because of the stack slots? Could you give an example of how it could be misused, so I can understand the need?
Re: Repository for the conversion machinery
"Frank Ch. Eigler" writes: > That makes sense, but how many people are in cagney's shoes I am one of those people - I have two email addresses listed in MAINTAINERS, with two sets of copyright papers filed with the FSF (a personal assignment and a work one). I use the appropriate email address for each commit depending on which maintainership role I'm reflecting. Neither address is "obsolete" and neither address is @gcc.gnu.org. Using d...@gcc.gnu.org would imply that is my email address, but email sent there would vanish. But I did discuss my case with esr and understand it's not as easy to solve as we'd like it to be.
Re: Repository for the conversion machinery
Richard Biener writes: >> Using d...@gcc.gnu.org would imply that is my email address, but email >> sent there would vanish. > > Would it? You're supposed to have a valid forwarding address on that. Frank tested it and it does seem to forward to me, so I guess so.
Re: reload question about unmet constraints
> So in general, it's really not safe to mark a constraint that accepts > only far memory as "memory constraint" with current reload. > > Note that *not* marking the constraint as memory constraint actually > does not prevent reload from fixing up illegitimate addresses, so you > shouldn't really see much drawbacks from not marking it ... While working through the regressions on this one I discovered one seemingly important side-effect... For such constraints that are memory operands but not define_memory_constraint, you need to use '*' to keep reload from trying to guess a register class from them (it guesses wrong for rl78). I.e. use "*Wfr" instead of "Wfr".
Re: Is anyone working on a Z80 port?
> I spec'd one out a long time ago for Cygnus/Red Hat, but we never > pursued the port. The register model on the z80 will be problematical, > though some of the lessons from the rl78 port would probably be useful. The RL78 is very much a modern decendent of the Z80 architecture so might serve as a good starting point. But yeah, it's a messy port because gcc doesn't like the weird addressing model. I ended up using a virtual ISA that gcc could deal with, then converted that to real instructions after reload.
Proposal to deprecate: mep (Toshiba Media Processor)
Given a combination of "I have new responsibilities" and "nothing has happened with mep for a long time" I would like to step down as mep maintainer. If someone would like to pick up maintainership of this target, please contact me and/or the steering committee. Otherwise, I propose this target be deprecated in GCC 6 and removed in 7. DJ
who owns stack args?
Consider this example (derived from gcc.c-torture/execute/920726-1.c): extern int a(int a, int b, int c, int d, int e, int f, const char *s1, const char *s2) __attribute__((pure)); int foo() { if (a(0,0,0,0,0,0,"abc","def") || a(0,0,0,0,0,0,"abc","ghi")) return 0; return 1; } On rl78-elf I'm seeing a bug that only happens if a() is declared "pure". When the bug triggers, the address of "abc" in the second call is *not* written to the stack. Instead, the move is deleted by DCE in postreload. It's not deleted if you remove the "pure". The bug was exposed when strcmp() became able to increment incoming stack arguments in-place, instead of copying them to registers. The example was intended to reproduce the bug on intel or arm, but it doesn't. If there's an obvious fix for this, I'm all ears, but... The real question is: are stack arguments call-clobbered or call-preserved? Does the answer depend on the "pure" attribute?
Re: Proposal to deprecate: mep (Toshiba Media Processor)
> Given a combination of "I have new responsibilities" and "nothing has > happened with mep for a long time" I would like to step down as mep > maintainer. > > If someone would like to pick up maintainership of this target, please > contact me and/or the steering committee. Otherwise, I propose this > target be deprecated in GCC 6 and removed in 7. MeP is now deprecated.
Re: Deprecating basic asm in a function - What now?
Given how many embedded ports have #defines in external packages for basic asms for instructions such as nop, enable/disable interrupts, other system-level opcodes, etc... I think this is a bad idea. Even glibc would break. #define enable() asm("eint") __asm__ __volatile__ ("fwait");
Re: gcc/libcpp: non-UTF-8 source or execution encodings?
David Edelsohn writes: > GCC on the system is not self-hosting -- I believe that GCC only is > used as a cross-compiler. I can confirm this - GCC for TPF is always a cross compiler, it never runs *on* a TPF system.
Re: [GCC Steering Committee attention] [PING] [PING] [PING] libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to te
Manuel Lpez-Ibñez writes: > none? for libiberty, no regular maintainer for build machinery, Perhaps this is a sign that I should step down as maintainers for those?
Re: [GCC Steering Committee attention] [PING] [PING] [PING] libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to te
Manuel Lpez-Ibñez writes: > I don't see how that helps. Neither my message nor Thomas's is a > criticism of people. The question is how to get more people to help > and how to improve the situation. For sure, everybody is doing the > best that they can with the time that they have. You complained that there were no libiberty maintainers (there are two) or build maintainers (there are many). As I am listed as one of each of those, this makes me wonder if there's no longer a need for such people (we're involved so infrequently that nobody notices) or that I'm just not able to put enough effort into it to be noticed (which may be true anyway). Either way, this is a part of your "problem" that I can address directly, so I'm doing so. > This is a problem throughout GCC. We have a single C++ maintainer, a > single part-time C maintainer, none? for libiberty, no regular > maintainer for build machinery, and so on and so forth.
Re: [GCC Steering Committee attention] [PING] [PING] [PING] libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to te
Manuel Lpez-Ibñez writes: > Another question is how to help existing maintainers such that they > are more motivated to review patches. Is it a lack of time? lack of > Interest in the project? do patches simply fall through the cracks? is > it a dead-lock of people waiting for each other to comment? In my case, I became a build/libiberty maintainer a long time ago (20 years or so), when DJGPP was a much more active project, and it made sense for me to be involved in parts of GCC that were sensitve to the needs of DOS and NTFS filesystems and OSs. This is not the case any more, and my justification for maintaining those parts of gcc have evaporated. Fortunately, a very minimal effort was involved to continue. However, since then, I've not only taken on maintainerships in other areas (mostly backends, which I need to wean off of eventually) but I've also switched groups at work, and am no longer focused on gcc (I'm focused on glibc now). Also, I've always been opposed to libiberty being a "catch-all" for cross-useful functionality, so I'm anti-motivated to work on those portions of libiberty that aren't strictly portability-layer-related (specifically, the demangler, which I leave to Ian). So... considering your big-picture-problem, where do I fit in? How can I make the big picture better, given what you now know about my situtation? What changes do you think would make sense?
Re: Why are GCC Internals not Specification Driven ?
Seima Rao writes: > Has gcc become proprietory/commercial ? By definition: no, yes. It's been this way since the beginning, and hasn't changed in decades. > Or has it become illegal to publish specification models > of gcc internals ? Does this make the product sell less ? This sounds like you're trying to start an argument, instead of asking a simple question. It is certainly not illegal to publish our specifications, and we certainly *do* publish many of our specifications (have you read the internals manual? You don't say whether or not you did, but that would be a key bit of information to have disclosed). Whether the product "sells" or not is rarely a driving factor for our project. Most of us work on it because we need it to work better for our own purposes. If you have specific questions about our documentation or development process, please ask them. Please do not ask vague, leading, and emotionally loaded questions. RTL and Gimple are documented. Are they documemented well? That depends on your needs. Are they documented as well as they could be? Probably not, but good enough for us so far. And as always, if you want to improve the situation, by all means feel free to volunteer to do so ;-)
Re: targetm.calls.promote_prototypes parameter
In my original proposal, I said this: > It includes a bunch of macro->hook conversions, mostly because the > hooks need an additional parameter (the function) to detect which ones > are Renesas ABI and which are GCC ABI. The original documentation at least hinted that the parameter was a function type: > @deftypefn {Target Hook} bool TARGET_PROMOTE_PROTOTYPES (tree @var{fntype}) Kazu's calls are in the C++ stuff, I don't know if g++ and Renesas C++ are compatible anyway (I doubt it), but that's what would be affected. The original work was for C compatibility.
Re: targetm.calls.promote_prototypes parameter
Jason Merrill writes: > I'm inclined to change the C++ FE to pass NULL_TREE instead until such > time as someone cares. The sh backend will at least not choke on that ;-)
Re: Status of m32c target?
Jeff Law writes: > I was going to suggest deprecation for gcc-8 given how badly it was > broken in gcc-7 and the lack of maintenance on the target. As much as I use the m32c target, I have to agree. I've tried many times to fix its reload problems to no avail, and just don't have time to work on gcc ports much any more.
Re: Status of m32c target?
Jeff Law writes: > A change in reload back in 2016 (IIRC) has effectively made m32c > unusable. The limits of the register file create horrible problems for > reload. > > I was going to suggest deprecation for gcc-8 given how badly it was > broken in gcc-7 and the lack of maintenance on the target. I gave this another shot Friday, I was thinking maybe we could retire the m32cm cpu and keep the r8c cpu, since the M32C family is essentially dead part-wise but there are still new R8C chips being made. The reload problems for r8c are still there, but I also discovered a bug in the m32cm cpu that might be generic... Are there any other targets that push large structures on the call stack via memcpy? I'm seeing failures due to mis-calculating stack adjustments in that case. $ m32c-elf-gcc -c -mcpu=m32cm -O3 dj.c typedef struct { void *a, *b, *c, *d; void *e, *f, *g; } cookie_io_functions_t; void *_impure_ptr; void * _fopencookie_r (void *ptr, void *cookie, const char *mode, cookie_io_functions_t functions); void * fopencookie ( void *cookie , const char *mode , cookie_io_functions_t functions) { return _fopencookie_r ( _impure_ptr , cookie, mode, functions); } dj.c: In function ‘fopencookie’: dj.c:16:10: internal compiler error: in expand_call, at calls.c:4426 return _fopencookie_r ( _impure_ptr , cookie, mode, functions); ^~~ printf("%x %x, %d %d %d\n", flags, ECF_NORETURN, old_stack_allocated, stack_pointer_delta, pending_stack_adjust); /* Verify that we've deallocated all the stack we used. */ gcc_assert ((flags & ECF_NORETURN) || (old_stack_allocated == stack_pointer_delta - pending_stack_adjust)); IIRC when this happens, "stack_pointer_delta" doesn't account for the size of the large-structure-argument - it has all the push'd args, but not the memcpy'd one. I.e. that printf I added prints this: 0 8, 0 12 40
Re: array bounds violation in caller-save.c : duplicate hard regs check added
> Date: Tue, 5 Jun 2012 21:59:15 -0400 (EDT) > From: Hans-Peter Nilsson > On Fri, 25 May 2012, DJ Delorie wrote: > > If I apply this patch, which checks for duplicate hard registers within > > -fira-share-save-slots, the following *-elf targets fail due to the assert: > > > > bfin cris m32c rl78 rx sh sh64 v850 > > Oop. An no clue as to what's wrong. > > Can you pretty please make the test-case n'all sent > down-thread into a PR? Sorry, I dropped the ball on this one. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54217
reverse bitfield patch
Here's my current patch for the bitfield reversal feature I've been working on for a while, with an RX-specific pragma to apply it "globally". Could someone please review this? It would be nice to get it in before stage1 closes again... Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 192009) +++ gcc/doc/extend.texi (working copy) @@ -5427,12 +5427,74 @@ Note that the type visibility is applied associated with the class (vtable, typeinfo node, etc.). In particular, if a class is thrown as an exception in one shared object and caught in another, the class must have default visibility. Otherwise the two shared objects will be unable to use the same typeinfo node and exception handling will break. +@item bit_order +Normally, GCC allocates bitfields from either the least significant or +most significant bit in the underlying type, such that bitfields +happen to be allocated from lowest address to highest address. +Specifically, big-endian targets allocate the MSB first, where +little-endian targets allocate the LSB first. The @code{bit_order} +attribute overrides this default, allowing you to force allocation to +be MSB-first, LSB-first, or the opposite of whatever gcc defaults to. The +@code{bit_order} attribute takes an optional argument: + +@table @code + +@item native +This is the default, and also the mode when no argument is given. GCC +allocates LSB-first on little endian targets, and MSB-first on big +endian targets. + +@item swapped +Bitfield allocation is the opposite of @code{native}. + +@item lsb +Bits are allocated LSB-first. + +@item msb +Bits are allocated MSB-first. + +@end table + +A short example demonstrates bitfield allocation: + +@example +struct __attribute__((bit_order(msb))) @{ + char a:3; + char b:3; +@} foo = @{ 3, 5 @}; +@end example + +With LSB-first allocation, @code{foo.a} will be in the 3 least +significant bits (mask 0x07) and @code{foo.b} will be in the next 3 +bits (mask 0x38). With MSB-first allocation, @code{foo.a} will be in +the 3 most significant bits (mask 0xE0) and @code{foo.b} will be in +the next 3 bits (mask 0x1C). + +Note that it is entirely up to the programmer to define bitfields that +make sense when swapped. Consider: + +@example +struct __attribute__((bit_order(msb))) @{ + short a:7; + char b:6; +@} foo = @{ 3, 5 @}; +@end example + +On some targets, or if the struct is @code{packed}, GCC may only use +one byte of storage for A despite it being a @code{short} type. +Swapping the bit order of A would cause it to overlap B. Worse, the +bitfield for B may span bytes, so ``swapping'' would no longer be +defined as there is no ``char'' to swap within. To avoid such +problems, the programmer should either fully-define each underlying +type, or ensure that their target's ABI allocates enough space for +each underlying type regardless of how much of it is used. + @end table To specify multiple attributes, separate them by commas within the double parentheses: for example, @samp{__attribute__ ((aligned (16), packed))}. Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c (revision 192009) +++ gcc/c-family/c-common.c (working copy) @@ -310,12 +310,13 @@ struct visibility_flags visibility_optio static tree c_fully_fold_internal (tree expr, bool, bool *, bool *); static tree check_case_value (tree); static bool check_case_bounds (tree, tree, tree *, tree *); static tree handle_packed_attribute (tree *, tree, tree, int, bool *); +static tree handle_bitorder_attribute (tree *, tree, tree, int, bool *); static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *); static tree handle_common_attribute (tree *, tree, tree, int, bool *); static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *); static tree handle_hot_attribute (tree *, tree, tree, int, bool *); static tree handle_cold_attribute (tree *, tree, tree, int, bool *); static tree handle_noinline_attribute (tree *, tree, tree, int, bool *); @@ -601,12 +602,14 @@ const unsigned int num_c_common_reswords const struct attribute_spec c_common_attribute_table[] = { /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, affects_type_identity } */ { "packed", 0, 0, false, false, false, handle_packed_attribute , false}, + { "bit_order", 0, 1, false, true, false, + handle_bitorder_attribute , false}, { "nocommon", 0, 0, true, false, false, handle_nocommon_attribute, false}, { "common", 0, 0, true, false, false, handle_common_attribute, false }, /* FIXME: logically, noreturn attributes should be listed as "false, true, true" and apply to function types. But implementing this @
Re: reverse bitfield patch
[sorry, should have gone to gcc-patches]
Re: reverse bitfield patch
> ChangeLog missing, new functions need a toplevel comment documenting > function, argument and return value as per coding conventions. Any review of the patch itself? I know the overhead is not there...
Re: reverse bitfield patch
> Why do you need to change varasm.c at all? The hunks seem to be > completely separate of the attribute. Because static constructors have fields in the original order, not the reversed order. Otherwise code like this is miscompiled: struct foo a = { 1, 2, 3 }; because the 1, 2, 3 are in the C layout order, but the underlying data needs to be stored in the reversed order. > which will severely pessimize bitfield accesses to structs with the > bitfield-order attribute. The typical use-case for this feature is memory-mapped hardware, where pessimum access is preferred anyway. > so you are supporting this as #pragma. Which ends up tacking > bit_order to each type. Rather than this, why not operate similar > to the packed pragma, thus, adjust a global variable in > stor-layout.c. Because when I first proposed this feature, I was told to do it this way. > I don't see a value in attaching 'native' or 'msb'/'lsb' if it is > equal to 'native'. You un-necessarily pessimize code generation (is > different code even desired for a "no-op" bit_order attribute?). If the attribute corresponds to the native mode, it should be a no-op. The pessimizing only happens when the fields are actually in reverse order. > So no, I don't like this post-process layouting thing. It's a > layouting mode so it should have effects at bitfield layout time. The actual reversal happens in stor-layout.c. Everything else is there to compensate for a possible non-linear layout.
Re: Time for GCC 5.0? (TIC)
Ian Lance Taylor writes: > Also the fact that GCC is now written in C++ seems to me to be > deserving of a bump to 5.0. I see no reason why an internal design change that has no user visible effects should have any impact on the version number. Typically a major version bump is reserved for either massive new functionality or a break with backwards compatibility.
Re: Time for GCC 5.0? (TIC)
> Marketing loves high numbers after all! If you truly think this way, we're going to have to revoke your hacker's license ;-)
Re: Deprecate i386 for GCC 4.8?
The official DJGPP triplet is for i586, not i386. I don't mind djgpp-wise if we deprecate i386, as long as we keep i586. Anyone still using djgpp for i386 can dig out old versions from the archives :-)
Re: ADDR_SPACE_CONVERT_EXPR always expanded to 0?
> A quick grep shows not many targets would be affected, AVR, m32c, rl78 and > spu. > You should work with the maintainers of those targets to see which approach > would be the best. For both m32c and rl78, one address space is a strict subset of the other (16-bit "near" vs 20/24/32-bit "far" pointers, that's all) so nothing magic there.
Re: GCC 4.8.0 does not compile for DJGPP
The DJGPP build of gcc 4.8.0 was just uploaded, it might have some patches that haven't been committed upstream yet.
Re: If you had a month to improve gcc build parallelization, where would you begin?
One thing I did in libiberty was to rearrange the targets so that the ones that took the longest started first. That way, you don't end up building 99% of the objects then waiting for the one last one to finish.
gettext prereq vs po/zh_TW
The gcc prereq page says gettext 0.14.5 is the minimum version, but po/zh_TW.po has lines like this: #, fuzzy #~| msgid "Unexpected EOF" #~ msgid "Unexpected type..." #~ msgstr "未預期的型態…" The | syntax appears to have been added in gettext 0.16, and gettext 0.14 can't process it. Seems to have been a result of this request: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01436.html
Re: Question about local register variable
The purpose of local register variables is to tell gcc which register to use in an inline asm, when multiple registers could be used. Other uses are not supported and usually don't work the way you expect, especially when optimizing. If all you want is a function which returns the value in a specific register, you could try using an asm like this with a local register variable: __asm__ __volatile__ ("# no actual opcode" : "=r" (r)); That tells gcc that something has put a value in 'r' (although nothing did, so the old value is used). However, usually such registers have global-specific values, so a global register variable is a better choice. If the register may be used by gcc for other purposes, you have no way of predicting what might be in it when that asm happens.
Re: DJ Delorie and Nick Clifton appointed as MSP430 port maintainers
On behalf of myself and Nick, many thanks to everyone involved in reviewing this port! I've checked in the port as per the last (approved) patch set I sent out.
Re: Invalid tree node causes segfault in diagnostic
> While I am at it, can I patch backends as well? For example > mep/mep.c has an occurrence of tree_code_name[TREE_CODE (... The mep change is pre-approved :-)
question about register pairs
The docs say to use HARD_REGNO_MODE_OK to enforce register pairs. But reload (find_valid_class_1) rejects classes that include such registers: for (regno = 0; regno < FIRST_PSEUDO_REGISTER && !bad; regno++) { if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno) && !HARD_REGNO_MODE_OK (regno, mode)) { bad = 1; In the past, if I use a register class that excludes the second half of register pairs, it can't do anything because it requires both parts of the register pair to be in the class (example: in_hard_reg_set_p checks this). Which way is the "right" way?
proposal to make SIZE_TYPE more flexible
There are a couple of places in gcc where wierd-sized pointers are an issue. While you can use a partial-integer mode for pointers, the pointer *math* is still done in standard C types, which usually don't match the modes of pointers and often result in suboptimal code. My proposal is to allow the target to define its own type for pointers, sizeof_t, and ptrdiff_t to use, so that gcc can adapt to weird pointer sizes instead of the target having to use power-of-two pointer math. This means the target would somehow have to register its new types (int20_t and uint20_t in the MSP430 case, for example), as well as specify that type to the rest of gcc. There are some problems with the naive approach, though, and the changes are somewhat pervasive. So my question is, would this approach be acceptable? Most of the cases where new code would be added, have gcc_unreachable() at the moment anyway. Specific issues follow... SIZE_TYPE is used... tree.c: build_common_tree_nodes() compares against fixed strings; if no match it gcc_unreachable()'s - instead, look up type in language core. c-family/c-common.c just makes a string macro, it's up to the target to provide a legitimate value for it. lto/lto-lang.c compares against fixed strings to define THREE related types, including intmax_type_node and uintmax_type_node. IMHO it should not be using pointer sizes to determine integer sizes. PTRDIFF_TYPE is used... c-family/c-common.c fortran/iso-c-binding.def fortran/trans-types.c These all use lookups; however fortran's get_typenode_from_name only supports "standard" type names. POINTER_SIZE is used... I have found in the past that gcc has issues if POINTER_SIZE is not a power of two. IIRC BLKmode was used to copy pointer values. Other examples: assemble_align (POINTER_SIZE); can't align to non-power-of-two bits assemble_integer (XEXP (DECL_RTL (src), 0), POINTER_SIZE / BITS_PER_UNIT, POINTER_SIZE, 1); need to round up, not truncate
Re: proposal to make SIZE_TYPE more flexible
> It is a deficiency that SIZE_TYPE is defined to be a string at all (and > likewise for all the other target macros for standard typedefs including > all those for ). Separately, it's a deficiency that these > things are target macros rather than target hooks. My thought was that there'd be a set of target hooks that returned a TREE for various types. But as an interim solution, the checks that use strcmp() should fail into a type lookup-by-name. I.e. replace the gcc_unreachables with expensive table lookups. I think there's an advantage in newlib to having a macro that expands to the type-as-a-string needed for various types. Of course, if gcc had a typedef for those types, that would be better, but slightly harder to autodetect. > Instead of having an __int128 keyword (target-independent) targets > would be able to define a set of N for which there are __intN > keywords and for which everything handling __int128 will equally > handle __intN. That sounds great, and would elide the problem of a target needing to register their own types for those. I hadn't considered simplifying the intN_t problem, since you *can* register custom types, I was mostly just thinking about how to *use* those types for size_t et al.
Re: proposal to make SIZE_TYPE more flexible
So, given all that, is there any way to add the "target-specific size_t" portion without waiting for-who-knows-how-long for the intN_t and enum-size-type projects to finish? Some form of interim API that we can put in, so that we can start working on finding all the assumptions about size_t, while waiting for the rest to finish?
weird logic about redeclaring builtins...
Given the logic in c/c-decl.c's diagnose_mismatched_decls, if a built-in function is *also* declared in a system header (which is common with newlib), gcc fails to mention either the builtin or the declaration if you redeclare the function as something else. I.e. this code: int foo(); int foo; gives the expected "previous declaration was at ..." error, and this code: int index; gives the expected "built-in function 'index' declared ..." error. However, this code: char *index(const char *,int); int index; gives neither the built-in error nor the previous-decl error. It *only* gives the "'index' was redeclared" error. Is this intentional? Is there an easy fix for this that works for all cases?
Re: proposal to make SIZE_TYPE more flexible
> > So, given all that, is there any way to add the "target-specific > > size_t" portion without waiting for-who-knows-how-long for the intN_t > > and enum-size-type projects to finish? Some form of interim API that > > we can put in, so that we can start working on finding all the > > assumptions about size_t, while waiting for the rest to finish? > > I have no idea how ugly something supporting target-specific strings would > be, since supporting such strings for these standard typedefs never seemed > to be a direction we wanted to go in. I tried to hack in support for intN_t in a backend, and it was a maze of initialization sequence nightmares. So I guess we need to do the intN_t part first. Is someone working on this? If not, is there a spec I could use to get started on it?
Re: proposal to make SIZE_TYPE more flexible
> Instead of a target-independent __int128 keyword, there would be a set > (possibly empty) of __intN keywords, determined by a target hook. Or *-modes.def ?
Re: proposal to make SIZE_TYPE more flexible
> That would be one possibility - if the idea is to define __intN for all > integer modes not matching a standard type (and passing > targetm.scalar_mode_supported_p), I advise posting details of what effect > this would have for all targets so we can see how many such types would > get added. I was thinking of using the existing PARTIAL/FRACTIONAL_INT_MODE macros. avr/avr-modes.def:FRACTIONAL_INT_MODE (PSI, 24, 3); bfin/bfin-modes.def:PARTIAL_INT_MODE (DI, 40, PDI); m32c/m32c-modes.def:PARTIAL_INT_MODE (SI, 24, PSI); msp430/msp430-modes.def:PARTIAL_INT_MODE (SI, 20, PSI); rs6000/rs6000-modes.def:PARTIAL_INT_MODE (TI, 128, PTI); sh/sh-modes.def:PARTIAL_INT_MODE (SI, 22, PSI); sh/sh-modes.def:PARTIAL_INT_MODE (DI, 64, PDI); I suspect we'd have to filter out the power-of-two PSI ones though, leaving: avr/avr-modes.def:FRACTIONAL_INT_MODE (PSI, 24, 3); bfin/bfin-modes.def:PARTIAL_INT_MODE (DI, 40, PDI); m32c/m32c-modes.def:PARTIAL_INT_MODE (SI, 24, PSI); msp430/msp430-modes.def:PARTIAL_INT_MODE (SI, 20, PSI); sh/sh-modes.def:PARTIAL_INT_MODE (SI, 22, PSI); I'm assuming we need a mode to go with any type we create? Otherwise, we could add a FRACTIONAL_INT_TYPE(wrapper-mode, bits) macro to add yet more.
Re: proposal to make SIZE_TYPE more flexible
> If you do want types without corresponding modes, that goes back to > having a hook to list the relevant type sizes. Perhaps a FRACTIONAL_INT_TYPE() macro then, for when there's no machine mode to go with it? Although I'm struggling to imagine a case where a target would need to define a bit-sized type that doesn't correspond to any machine mode.
Re: proposal to make SIZE_TYPE more flexible
> Everything handling __int128 would be updated to work with a > target-determined set of types instead. > > Preferably, the number of such keywords would be arbitrary (so I suppose > there would be a single RID_INTN for them) - that seems cleaner than the > system for address space keywords with a fixed block from RID_ADDR_SPACE_0 > to RID_ADDR_SPACE_15. I did a scan through the gcc source tree trying to track down all the implications of this, and there were a lot of them, and not just the RID_* stuff. There's also the integer_types[] array (indexed by itk_*, which is its own mess) and c_common_reswords[] array, for example. I think it might not be possible to have one RID_* map to multiple actual keywords, as there are few cases that need to know *which* intN is used *and* have access to the original string of the token, and many cases where code assumes a 1:1 relation between RID_*, a type, and a keyword string. IMHO the key design choices come down to: * Do we change a few global const arrays to be dynamic arrays? * We need to consider that "position in array" is no longer a suitable sort key for these arrays. itk_* comes to mind here, but RID_* are abused sometimes too. (note: I've seen this before, where PSImode isn't included in "find smallest mode" logic, for example, because it's no in the array in the same place as SImode) * Need to dynamically map keywords/bitsizes/tokens to types in all the cases where we explicitly check for int128. Some of these places have explicit "check types in the right order" logic hard-coded that may need to be changed to a data-search logic. * The C++ mangler needs to know what to do with these new types. I'll attach my notes from the scan for reference... Search for in128 ... Search for c_common_reswords ... Search for itk_ ... --- . --- tree-core.h enum integer_type_kind is used to map all integer types "in order" so we need an alternate way to map them. Currently hard-codes the itk_int128 types. tree.h defines int128_unsigned_type_node and int128_integer_type_node uses itk_int128 and itk_unsigned_int128 - int128_*_type_node is an [itk_*] array reference. builtin-types.def defines BT_INT182 but nothing uses it yet. gimple.c gimple_signed_or_unsigned_type maps types to their signed or unsigned variant. Two cases: one checks for int128 explicitly, the other checks for compatibility with int128. tree.c make_or_reuse_type maps size/signed to a int128_integer_type_node etc. build_common_tree_nodes makes int128_*_type_node if the target supports TImode. tree-streamer.c preload_common_nodes() records one node per itk_* --- LTO --- lto.c read_cgraph_and_symbols() reads one node per integer_types[itk_*] --- C-FAMILY --- c-lex.c intepret_integer scans itk_* to find the best (smallest) type for integers. narrowest_unsigned_type assumes integer_types[itk_*] in bit-size order, and assumes [N*2] is signed/unsigned pairs. narrowest_signed_type: same. c-cppbuiltin.c __SIZEOF_INTn__ for each intN c-pretty-print.c prints I128 suffix for int128-sized integer literals. c-common.c int128_* has an entry in c_global_trees[] c_common_reswords[] has an entry for __int128 -> RID_INT128 c_common_type_for_size maps int:128 to int128_*_type_node c_common_type_for_mode: same. c_common_signed_or_unsigned_type - checks for int128 types. same as igmple_signed_or_unsigned_type?() c_build_bitfield_integer_type assigns int128_*_type_node for :128 fields. c_common_nodes_and_builtins maps int128_*_type_node to RID_INT128 and "__int128". Also maps to decl __int128_t keyword_begins_type_specifier() checks for RID_INT128 --- C --- c-tree.h adds cts_int128 to c_typespec_keyword[] c-parser.c c_parse_init() reads c_common_reswords[] which has __int128, maps one id to each RID_* code. c_token_starts_typename() checks for RID_INT128 c_token_starts_declspecs() checks for RID_INT128 c_parser_declspecs() checks for RID_INT128 c_parser_attribute_any_word() checks for RID_INT128 c_parser_objc_selector() checks for RID_INT128 c-decl.c error for "long __int128" etc throughout declspecs_add_type() checks for RID_INT128 finish_declspecs() checks for cts_int128 --- FORTRAN --- ico-c-binding.def maps int128_t to c_int128_t via get_int_kind_from_width( --- C++ --- class.c layout_class_types uses itk_* to find the best (smallest) integer type for overlarge bitfields. lex.c init_reswords() reads c_common_reswords[], which includes __int128 rtti.c emit_support_tinfos has a dummy list of types fund