Looking at UNSUPPORTED dejagnu tests for a port...
I’m doing some final polishing on a gcc 8.3 upgrade and taking a look at the unsupported tests. Most of them are completely sensible (my port doesn’t support trampolines, for example). But gcc.c-torture/execute/pr78622.c is marked as unsupported. That appears to be due to the line { dg-require-effective-target c99_runtime } I’m using newlib, and if I manually compile the test case with or without an explicit —std=c99, it compiles and links without error. Do I need to set something in the baseboards file or in a local .exp file to indicate that c99 is okay?
Turning off SRA
I’m working on performance tuning a gcc 8.3 port and wanted to turn off SRA for an experiment. But passing both -fno-tree-sra -fno-ipa-sra but it’s still tagging compiled functions with a “_isra” suffix, which would seem to indicate that it’s still running that optimization. Is there a bigger hammer I’m missing? Alan Lehotsky https://codegentllc.com
Trying to chase down a scheduler bug in gcc 4.4.1
I’m in the process of upgrading a gcc port, but my client is using a gcc 4.4.1 port right now and has run into a scheduler bug. This seems to have been fixed at some point, as the 8.3.1 code base doesn’t seem to have the bug. But they’d like a fix on their 4.4.1 base. Basically, what I see is a block of code where we have struct pnode * pn = ctx->return_pn; atomic_write_u32((unsigned int*)>return_pn, 0); x = pn-> x; where the ‘atomic_write_u32’ is an extended asm that is basically static inline void atomic_write_u32( unsigned int *reg, unsigned int v) { asm ( "move %[dest], %[src]\n” : [dest] “=m” (*reg) : [src] “r” (v)); } What happens is that the load of the local pn gets motioned AFTER the assignment of zero to the passed in reference of ctx->return_pn, and we SEGV at runtime dereferencing a NULL pointer. I’ve checked the phase dumps and everything’s fine in the RTL until sched1, where we end up with ;;== ;; -- basic block 2 from 2 to 13 -- before reload ;; == ;; 0--> 2 r50=d0 :i_pipeline ;; 1--> 7 r51=0x0 :i_pipeline ;; 2--> 8 [r50+0x9c]=asm_operands :nothing ;; 3--> 6 r47=[r50+0x9c] :i_pipeline<=== moved load of pn after code that zeroes it ;; 4--> 9 d0=r47 :i_pipeline ;; 5-->10 call [`pnode_ref_dec'] :i_pipeline ;; 6-->12 {cc=cmp([r50+0xb0],0x0);r49=[r50+0 :i_pipeline ;; 7-->13 pc={(cc==0x0)?L23:pc}:i_pipeline ;; Ready list (final): ;; total time = 7 ;; new head = 2 ;; new tail = 13 I grovelled thru Bugzilla, couldn’t find anything that seemed relevant using search terms like “sched”, “haifa”, “asm”. I’m hoping that someone might recognize this problem and point me to a relevant Bugzilla report before I dig into the schedule pass to try and see why it goes wrong. I’m guessing that the pass is not recognizing the aliasing of >return_pn in the caller and *reg in the inline asm results in thinking it’s safe to motion a reference to ctx->return_pn...
Re: Hoisting DFmode loads out of loops..
On Jun 25, 2020, at 6:37 PM, Jeff Law mailto:l...@redhat.com>> wrote: On Thu, 2020-06-25 at 15:46 -0400, Alan Lehotsky wrote: I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit data-path between registers and memory; looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the move of a double constant to memory into multiple moves (4 in fact, because I only have a 16-bit immediate mode.) The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”. Is there some other trick I need get the constant hoisted. I have already set the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns) Hi Alan, it's been a long time... We'd probably need to set the RTL. A variety of things can get in the way of LICM. For example, I'd expect subregs to be problematical because they can look like RMW operations. jeff Hello to you too, Jeff…. I’ve been lurking for the last decade or so, last port I actually did was was GCC 4 based, so lots of new stuff to try and wrap my head around. I certainly am grateful for anybody with suggestions as to how to track down this problem (I’m not terribly eager to do a parallel stepping thru a x86 gcc in parallel with my port to see where they diverge in the loop-invariant recognition.) Although in crafting this expanded email, I see that the x86 has already decided to store the constant 18.4242 in the .rodata section by the start of loop-invariance so there’s a (set (reg:DF…. ) (mem:DF (symbol_ref ….))) and I bet that’s far easier to move out of the loop than it would be to split the original (set (mem:DF…) (const_double:DF ….)) — Al == Source code is void f (double *a) { int i; for (i = 0; i < 100; i++_ a[i] = 18.4242; } == Here’s the dump from loop-9.c.252r.loop2-invariant (compiled -O1) ;; Function f (f, funcdef_no=0, decl_uid=1458, cgraph_uid=0, symbol_order=0) *starting processing of loop 1 ** starting the processing of deferred insns ending the processing of deferred insns setting blocks to analyze 3, 5 starting the processing of deferred insns ending the processing of deferred insns df_analyze called df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( 0.33) df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 2 ( 0.33) df_worklist_dataflow_doublequeue: n_basic_blocks 6 n_edges 6 count 3 ( 0.5) starting region dump f Dataflow summary: def_info->table_size = 3, use_info->table_size = 23 ;; invalidated by call 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7 [d7] 8 [d8] 9 [d9] 14 [d14] 15 [d15] 16 [a0] 19 [a3] 20 [a4] 24 [acc0_hi] 25 [acc0_lo] 26 [acc1_hi] 27 [acc1_lo] 28 [source3] 30 [cc] 31 [int_set0] 32 [int_set1] 33 [int_clr0] 34 [int_clr1] 35 [scratchpad0] 36 [scratchpad1] 37 [scratchpad2] 38 [scratchpad3] ;; hardware regs used 23 [sp] 29 [arg] 39 [sfp] ;; regular block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp] ;; eh block artificial uses 22 [a6] 23 [sp] 29 [arg] 39 [sfp] ;; entry block defs 0 [d0] 1 [d1] 2 [d2] 3 [d3] 4 [d4] 5 [d5] 6 [d6] 7 [d7] 8 [d8] 9 [d9] 21 [a5] 22 [a6] 23 [sp] 29 [arg] 39 [sfp] ;; exit block uses 22 [a6] 23 [sp] 39 [sfp] ;; regs ever live 0 [d0] 30 [cc] ;; ref usage r0={1d,1u} r1={1d} r2={1d} r3={1d} r4={1d} r5={1d} r6={1d} r7={1d} r8={1d} r9={1d} r21={1d} r22={1d,5u} r23={1d,5u} r29={1d,4u} r30={3d,1u} r39={1d,5u} r46={2d,4u} r48={1d,1u} ;;total ref usage 47{21d,26u,0e} in 6{6 regular + 0 call} insns. ;; Reaching defs: ;; sparse invalidated ;; dense invalidated 0, 1 ;; reg->defs[] map: 30[0,1] 46[2,2] ;; bb 3 artificial_defs: { } ;; bb 3 artificial_uses: { u7(22){ }u8(23){ }u9(29){ }u10(39){ }} ;; lr in 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48 ;; lr use 22 [a6] 23 [sp] 29 [arg] 39 [sfp] 46 48 ;; lr def 30 [cc] 46 ;; live in 46 ;; live gen 30 [cc] 46 ;; live kill 30 [cc] ;; rd in (1) 46[2] ;; rd gen (2) 30[1],46[2] ;; rd kill (3) 30[0,1],46[2] ;; UD chains for artificial uses at top (code_label 11 7 8 3 2 (nil) [0 uses]) (note 8 11 9 3 [bb 3] NOTE_INSN_BASIC_BLOCK) ;; UD chains for insn luid 0 uid 9 ;; reg 46 { d2(bb 3 insn 10) } (insn 9 8 10 3 (set (mem:DF (reg:SI 46 [ ivtmp___6 ]) [0 MEM[base: _15, offset: 0B]+0 S8 A32]) (const_double:DF 1.842419990222931955941021442413330078125e+1 [0x0.9364c2f837b4ap+5])) "loop-9.c":9 19 {movdf} (nil)) ;; UD chains for insn luid 1 uid 10 ;; reg 46 { d2(bb 3 insn 10) } (insn 10 9 12 3 (parallel [ (set (reg:SI 46 [ ivtmp___6 ]) (plus:SI (reg:SI 46 [ ivtmp___6 ]) (const_int 8 [0x8]))) (clobber (reg:CC 30 cc)) ]) 81 {addsi3_1v5} (expr_list:REG_UNUSED (reg:CC 30 cc) (nil))) ;; UD chains for insn luid 2 uid 12 ;; reg 46 { d2(bb 3 insn 10) } ;; reg 48 { } (insn 12 10 13 3 (set (reg:CCWZ 30 cc) (compare:CCWZ (reg:SI 46 [ ivtmp___6 ]) (reg:S
Hoisting DFmode loads out of loops..
I’m working on a GCC 8.3 port to a load/store architecture with a 32-bit data-path between registers and memory; looking at the gcc.dg/loop-9.c test, I fail to pass because I have split the move of a double constant to memory into multiple moves (4 in fact, because I only have a 16-bit immediate mode.) The (define_insn_and_split “movdf” …) is conditioned on “reload_completed”. Is there some other trick I need get the constant hoisted. I have already set the rtx cost of the CONST_DOUBLE ridiculously high (like 10 insns) Alan Lehotsky https://codegentllc.com
connecting a QEMU VM to dejagnu...
I’m trying to grapple with connecting dejagnu to a QEMU simulator; not finding any obvious examples to work with. I’ve had a lot of familiarity using CGEN simulators connected to dejagnu, but QEMU’s a new breed of cat…. Can anyone point me to a boards/.exp that is based on using QEMU, or provide other examples. The one example I found via a web search seems to want to do everything in the virtual machine - but I have to believe that’s going to be insanely slow…
setting include paths for a cross compiler in gcc 3.4.6
I have a funny situation where I’m trying to build a cross compiler for x86 hosted on x86 where I’d like to use the native headers and libraries. I tried defining INCLUDE_DEFAULTS, and that didn’t help. The documentation says it’s ignored for cross compilers. Any suggestions, or am I going to have to fool the configuration scripts into thinking this is a host=target configuration?
How to upgrade a tool-chain tree...
I have a tool chain for an experimental processor, built starting with release or snapshot distributions of binutils-2.21 cgen-20110901 gcc-4.6.1 newlib-1.19.0 gdb 7.2 I'm using SVN for version control locally. I'd like to upgrade it to a newer source base; but since it wasn't done by checking out from the FSF and Sourceware version-control systems, I'm wondering if there's any clever way to merge my tree or if I just have to bite the bullet and essentially take everything I've modified since version 1 in my source tree and import them into a new source tree?
Re: delay slot of conditionnal branch with no annuled jump strategy
I have a gcc 4.6.1 port that has the same sort of problems. I tried selectively porting some patches from later 4.6 releases, but they didn't seem to actually address the issue. I haven't looked at the trunk to see if there are patches that are more apropos. On Oct 10, 2013, at 12:33 PM, Jeff Law l...@redhat.com wrote: On 10/10/13 07:31, BELBACHIR Selim wrote: Why GCC doesn't see, in this case, that it's not safe to fill the delay slot with my compare insn (which is a parallel RTX which clobber one register used in fallthrough branch) ? Is a processor 'annuled jump strategy' mandatory to handle delay slot of conditionnal jump instructions ? You'd need to debug reorg. reorg has code to track resources to avoid these kind of issues. You'd have to debug why it's not working as expected. annulling is not required for proper functioning of the reorg pass, it's merely an optimization. I'd start by first verifying your delay slot descriptions do not allow nullifying the delay slot. Jeff
Confusion about delay slots and using condition-code register
I'm using the CCmode model for condition-code handling in a 4.6.1 based compiler. Every other port I've done used the CC0 model, so I'm probably doing something misguided here. I'm down to just 170 failures in the check-gcc testsuites, so it's looking pretty solid; of the failures about 30 are tests with delay-slots being filled incorrectly. The situation I see is where we have source that looks like if (x != 0) count++; if (y != z) . RTL (without delay slot considerations looks like) jeq$1 add r1,1 $1: cmp r2,r3 jeq $2 branches have delay slots, and are not annullable. When reorg runs, it realizes that it can't put the add into the delay slot, but it hoists the cmp instruction into the first branch slot, ala jeq $1 cmp r2,r3 add r1,1 $1: jeq $r2 .. So, if the first branch is not taken, we set the condition codes needed for the second branch and clobber them with the add instruction then fall to the conditional branch using the wrong condition codes. I emit (clobber (reg:CC CCreg)) with every instruction that can set condition codes, but it appears that nearly all of them are removed before we reach reorg where mark_referenced_resources() or mark_set_resources() would detect a conflict of the CCreg's. So, am I constructing my RTL incorrectly? Do I need to be making the clobbers inside a parallel instead of just emitting them sequentially? Or should I just fall back to a cc0 model where this shouldn't be a problem? The define_expand pattern for add looks like (define_expand addS:mode3 [(set (match_operand:S 0 nonimmediate_operand) (plus:S (match_operand:S 1 general_operand) (match_operand:S 2 general_operand))) (clobber (reg:CC CC_REGNUM))] . }) has corresponding define_insn's are (define_insn *addsi [(set (match_operand:SI 0 nonimmediate_operand =rm,rm,rS,rm) (plus:SI (match_operand:SI 1 nonimmediate_operand %0, 0, 0,rm) (match_operand:SI 2 general_operand QI, K, i,rm)))] , ) (define_insn *addsi_cc [(set (reg:CC CC_REGNUM) (compare:CC (plus:SI (match_operand:SI 1 nonimmediate_operand %0, 0, 0,rm) (match_operand:SI 2 general_operand QI, K, i,rm)) (const_int 0))) (set (match_operand:SI 0 nonimmediate_operand =rm,rm,rS,rm) (plus:SI (match_dup 1) (match_dup 2)))]
filling delay slots with branches
Am I correct in my understanding that you can't put a branch instruction in the delay slot of a branch instruction? Semantically, the HW I'm looking at annuls the branch in the delay slot if the first branch is taken, but any other instructions are not annulled; but it appears that there's no way to describe this in the define_delay() and it looks to me like the delay-slot for the instruction in the delay slot won't get filled properly either. e.g. cmpi $r1,0 jeq $1 jlt $2 jmp $3 nop would be a 3-way branch on zero, neg or (by elimination) positive values with the indented instructions being in a branch delay slot.
Can DWARF2 CFI represent a static return location?
I'm looking at a machine with limited stack, and no push instructions or displaced-addressing mode. The call instruction stores the return address in the link register. For non-recursive functions we save the return address in a static memory location, but I can't find a way to tell the DWARF2 CFI that the saved location is a static MEM rtx. Am I missing something?
code hoisting with CCmode condition codes
I'm obvkously doing something stupid here; but I'm at a loss to figure it out. Porting to a machine where most instructions set some condition codes and before hoisting, we have (insn 1205 1204 1206 65 (set (reg:CC_ZN 24 *cc) (compare:CC_ZN (reg:SI 843) (reg:SI 844))) ../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 83 {*cmpsi_zn} (expr_list:REG_EQUAL (compare:CC_ZN (reg:SI 843) (const_int 0 [0])) (expr_list:REG_DEAD (reg:SI 844) (nil (jump_insn 1206 1205 1207 65 (set (pc) (if_then_else (eq (reg:CC_ZN 24 *cc) (const_int 0 [0])) (label_ref 1215) (pc))) ../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 86 {branch_insn} (expr_list:REG_DEAD (reg:CC_ZN 24 *cc) (nil)) - 1215) But after the hoist pass runs, we've inserted a (set (reg:1884) (plus (reg 674) (const_int 4))) between the CC set and use. (insn 1203 1202 1205 62 (clobber (reg:CC 24 *cc)) ../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 -1 (expr_list:REG_UNUSED (reg:CC 24 *cc) (nil))) (insn 1205 1203 6262 62 (set (reg:CC_ZN 24 *cc) (compare:CC_ZN (reg:SI 843) (const_int 0 [0]))) ../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 83 {*cmpsi_zn} (expr_list:REG_EQUAL (compare:CC_ZN (reg:SI 843) (const_int 0 [0])) (expr_list:REG_DEAD (reg:SI 844) (nil (insn 6262 1205 1206 62 (set (reg/f:SI 1884 [ ap ]) (plus:SI (reg/v/f:SI 674 [ ap ]) (const_int 4 [0x4]))) 10 {*addsi} (nil)) (jump_insn 1206 6262 1207 62 (set (pc) (if_then_else (eq (reg:CC_ZN 24 *cc) (const_int 0 [0])) (label_ref 1215) (pc))) ../../../../../uberbaum/newlib/libc/stdio/vfprintf.c:1046 86 {branch_insn} (expr_list:REG_DEAD (reg:CC_ZN 24 *cc) (nil)) - 1215) and this is a problem because that will codegen as a move$rtemp,src addi $rtemp,4 and both of those instructions end up setting condition codes, breaking the dependency with the compare that actually sets the CCs.. I've looked at a number of the other CCmode ports, but can't see how they prevent this from occurring. Do I need to do something like change my define_expand for addsi3 to be something like [ (parallel [ (clobber:CCmode (reg:CCmode CCREG)) (set:SI (match_operand:SI 0 ) (plus:SI (match_operand::SI 1 ) (match_operand:SI 2 )))])] So that there's an explicitly stated dependency that would prevent motioning before the use in the jump_insn? But I don't see this construct used in other ports that have CCmode registers as opposed to cc0 I'm porting of 4.6.1 - is there any better documentation of CCmode ports or a reference port with
Using a 'V' constraint with QI mode....
The V constraint is essentially implemented by checking that the addressing mode presented is NOT offsettable. But that's done by adding GET_MODE_SIZE(mode) - 1. I've got a machine that supports indirection but not offsetting or indexing. But the V constraint fails for any (mem:QI (reg:SI p0 ) ) Am I missing some trick that would allow me to make effective use of V?
Bad and/or stupid code for DImode compares with gcc 4.6.1
I'm looking at code generated for a new port of gcc using 4.6.1 and failing execute/950607-2.c with -O0 only The target chip has only 32 bit instructions, so it's using do_jump_by_parts_relop_rtx() to expand the compare. I've set up my .md to use the CCmode. I see one case that seems really stupid, and one that's just wrong. I'm thinking that either I have something really flawed with my port's handing of DImode or that there was a bug in 4.6.1.The port is only failing about 2100 dejagnu tests (passing 64000+) and a good chunk of the failures are due to the ridiculously small data-memory size of the chip. For long long int x; if ( x 0 ) return 0 else return 2; I see code that compares MSBs and branches on (less than) as expected. But then it goes and checks the MSBs for != , and finally it checks the LSBS and emits a conditional branch to the ELSE, followed by an unconditional branch to the ELSE, so that I end up with code that looks like mov $r1,x mov $r2,x+4 cmpi $r2,0 jlt .L5 cmpi $r2,0 === totally redundant for x 0 comparisons jne .L2 cmpi $r1,0 jmp .L4 .L5 : movi $r1, 0 jump .L4 .L2 : movi $r1, 2 .L4: ret This is a simplification of 950607-2.c, which fails at -O0, but passes at higher optimization levels (go figure...)
Re: Bad and/or stupid code for DImode compares with gcc 4.6.1
So, I found the patch to do_jump_by_parts_greater_rtx() by Eric Botcazou that should address the stupid code and the redundant branch. Should have done a broader search before I wasted email bandwidth... On Oct 31, 2012, at 1:51 PM, Alan Lehotsky alehot...@me.com wrote: I'm looking at code generated for a new port of gcc using 4.6.1 and failing execute/950607-2.c with -O0 only The target chip has only 32 bit instructions, so it's using do_jump_by_parts_relop_rtx() to expand the compare. I've set up my .md to use the CCmode. I see one case that seems really stupid, and one that's just wrong. I'm thinking that either I have something really flawed with my port's handing of DImode or that there was a bug in 4.6.1.The port is only failing about 2100 dejagnu tests (passing 64000+) and a good chunk of the failures are due to the ridiculously small data-memory size of the chip. For long long int x; if ( x 0 ) return 0 else return 2; I see code that compares MSBs and branches on (less than) as expected. But then it goes and checks the MSBs for != , and finally it checks the LSBS and emits a conditional branch to the ELSE, followed by an unconditional branch to the ELSE, so that I end up with code that looks like mov $r1,x mov $r2,x+4 cmpi $r2,0 jlt .L5 cmpi $r2,0 === totally redundant for x 0 comparisons jne .L2 cmpi $r1,0 jmp .L4 .L5 : movi $r1, 0 jump .L4 .L2 : movi $r1, 2 .L4: ret This is a simplification of 950607-2.c, which fails at -O0, but passes at higher optimization levels (go figure...)
problems in interaction between peephole on CALL_INSN and final_scan_insn
When a peephole is recognized, the first insn in the group is replaced by a pseudo insn that contains all the referenced operands in the TEMPLATE and sets an INSN_CODE to indicate which peephole matched. This is all well and good, except that if the peephole involves a CALL_INSN, final_scan_insn() will invoke call_from_call_insn() to try and get the call RTL. But if the peephole is in fact some kind of a tail call, we no longer have a call expression to be found and end up asserting in call_from_call_insn(). I think I can work around this by switching to a define_peephole2 converting the call return into an unspec, or maybe by doing a match tthat grabs the whole call as an operand instead of just the function address. I'm not sure if the correct fix to this involves changing the way genpeep.c works or changing call_from_call_insn to be more forgiving - either one seems really difficult unless there's existing code that transmutes top-level RTL among CALL_INSN, JUMP_INSN, etc already... Just in case I'm doing something stupid, here's my peephole (define_peephole [ (parallel [(set (reg:SI RV_REGNUM) (call (match_operand:SI 0 memory_operand ) (match_operand 1 ))) (clobber (reg:SI LR_REGNUM))]) (parallel [(use (match_operand 2 ieu_operand rm)) (return)]) ] !final_sequence { if (CONSTANT_P (operands[0])) return jmp\t%0\; mov\tr0,%2; else return ret\t%0\; mov\tr0,%2; } [(set_attr type call)] )
Re: problems in interaction between peephole on CALL_INSN and final_scan_insn
I'm certain there are better ways; can you be more specific though? Or are you just talking about defining a sibcall_epilogue pattern? On Jul 8, 2012, at 5:26 PM, Andrew Pinski wrote: On Sun, Jul 8, 2012 at 2:23 PM, Alan Lehotsky qsm...@earthlink.net wrote: When a peephole is recognized, the first insn in the group is replaced by a pseudo insn that contains all the referenced operands in the TEMPLATE and sets an INSN_CODE to indicate which peephole matched. This is all well and good, except that if the peephole involves a CALL_INSN, final_scan_insn() will invoke call_from_call_insn() to try and get the call RTL. But if the peephole is in fact some kind of a tail call, we no longer have a call expression to be found and end up asserting in call_from_call_insn(). Simple answer don't use peephole optimization to perform the tail call optimization. There are better ways of performing that optimization. Thanks, Andrew
pointer modes for Harvard architecture....
I'm working on a port to a Harvard architecture where the data memory addresses are only 14 bits wide (e.g. 16kb) and the instruction address space is 21 bits wide. I do not want to define Pmode as PSImode; the machine has separate address registers for data memory AND with such limited data memory, I really want data pointers to stay HImode. I've noticed that some generated function calls are appearing as (call (mem:SI (symbol_ref:HI (function_name) which I suspect is wrong for code addresses outside of the first 65kb of instruction memory. It would be helpful to see an existing port with wider function pointers to help me avoid stumbling over some of these issues. Is there a current port that has larger instruction memory addresses than data addresses?
printed versions of GCC Internals book?
While I really like machine-readable (and searchable) text online for the GCC internals, there's still an atavistic streak in me that wants hard copy that I can put post-it notes on, run a highlighter over relevant passages or read when I'm not near a computer screen. I have two bound hard-copies (but the newer one is GCC 2.95) and laser-printed newer editions, but I've decided I really miss the bound-book format. Anybody have any experience with using one of the print-on-demand services to produce a recent version of the gccint manual? I was actually kind of surprised that the FSF hasn't taken advantage of this as a fund-raising opportunity. After the initial setup costs, it looks like the per/book price for the 700pg gccint would be about $20, but the setup fees (at least here http://www.harvard.com/on_our_shelves/in_store_book_printing/books_on_demand/ ) would be ~$100. So, unless someone has already done this, is there anyone else who'd want to buy a printed copy at a price that would recover my investment in the setup costs and postage? I'd be happy to turn over the whole project to the FSF so they could end up with an ongoing revenue stream once I break-even on the deal I'd guess that with 10 copies, we'd be looking at ~$35/copy, which is about as high I price as I'd be willing to pay if I was reading this email instead of writing it. So, are there 10 people out there who'd like a reasonably current version of the Internals book, or is there someone else who'd like to drive? -- Al Lehotsky
Re: Question about perl while bootstrapping gcc
This is normal unix behavior (unless you have some kind of shell that I'm unfamiliar with.) When you use to create a subjob, it is still attached to your terminal session. Take a look at the at(1) or batch(1) commands if you really want to execute a command and logout while it's still running. -Original Message- From: Dominique Dhumieres domi...@lps.ens.fr Sent: Apr 16, 2010 2:10 PM To: gcc@gcc.gnu.org Subject: Question about perl while bootstrapping gcc Hi! I use to build gcc with a command line such as make -j2 somelogfile I recently found that if I logout, the build fails with perl: no user 501 Is this a bug or a feature? In the former case I'll open a PR. In the later is it documented somewhere that you should not logout while building gcc? If yes, is it possible to have a pointer? TIA Dominique
Re: Is it possible to port GCC backend to a architecture with very limited hard registers?
Almost certainly you will run into severe problems in the reload phase. You might also profitably study the ip2k port. This is a ALU machine, but it does have multiple address registers. -Original Message- From: redriver jiang jiang.redri...@gmail.com Sent: Mar 17, 2010 8:55 AM To: gcc@gcc.gnu.org Subject: Is it possible to port GCC backend to a architecture with very limited hard registers? Hi all, Right now I attempts to port the GCC backend to a MCU with very limited hard registers: only one 8 bit ACC reg, one 16 bit base reg for addressing, one stats reg. I searched the GCC backend porting, and seems 68HC1X has the similar scene, but it use many ram simulated register. I wonder that if it is possbile to provided thislimited 3 register to GCC bankend, and let all 16bit(HImode), 32bit(SImode) operands spilled to stack. Thanks! Redriver
Re: How to control code segments ?
Look at the implementation of the IP2K compiler and linker. It uses a segmented paged architecture just like the machine you are describing. In essence what we did was implement linker relaxation to deal with this. When we called any function, we emitted the appropriate long-call by setting the page register and jumping to the location on that page. In the linker, we implemented relaxation code that looked to see if we were changing to the SAME page, and if so deleted the instruction changing the PAGE and did a local jump to the destination. Now, because a function could cross a page boundary (we only had 4kb pages (and 16 bit instructions), all our branches were done this way (if I recall correctly). It's a little tedious, but not too technically demanding a solution Al Lehotsky On Nov 30, 2008, at 2:06 PM, Dong Phuong wrote: I'm porting for a microcontroler which has segmented memory. THe memory is devided into many pages, each page is 16K. And I'm going to use 256 pages for code. But these 256 pages are not continuous in physical memory, so when I want to jump to a function, I have to know what is the segment address of this function, and then set the CSP with this value, and jump to it. So what I want to know is if I'm in a function, is there any way for me to know what code segments I'm locating in ? If I know this, when I have to jump to another function, I can decide wheather this function is in the same segment with the function that I'm locating in, and then can decide if I have to change the CSP. And when I compile a long long program with so many methods, is there any way for GCC so that it can realize that the code has exceeded 16K and have to use a new segments ? or the user must explicit declare this in the C source program ? If you know any hints or any doccument about this, please show me. THank you very much.
Re: gcc compiler for pdp10
Martin, I did a port of GCC to the Analog Devices SHARC chip. I ended up supporting 3 kinds of pointers for this chip (two for address spaces and one for byte pointers - the chip itself is only word addressable (although words can be from 16 to 48 bits in size depending on what memory is being accessed.) I also worked on the Bliss-36 compiler at DEC, so I'm well acquainted with the PDP10 architecture. I don't have access to any 10/20 HW, but I'd be happy to act as a reviewer/advisor to your changes. Al Lehotsky On Apr 18, 2008, at 20:21, Martin Chaney wrote: Hi, I'm am the proprietor of a gcc compiler for the PDP10 architecture. (This is a compiler previously worked on by Lars Brinkhoff who left XKL some while before I joined XKL. It's possible some of you may have been familiar with him or the compiler from that time.) The compiler is currently in a state where it is synched with the both the 4.3 and 4.4 branches, and it passes the testsuite tests (with the exception of some I've flagged as expected failures for the pdp10). My employer is happy to release my work on the gcc compiler back to the gcc community and I've sent in a request for the necessary forms. The PDP10 architecture is unusual in various ways that distinguish it from the mainstream architectures supported by the gcc compiler and this has made the development of this compiler a significant task. Undoubtedly I've made customizations in inappropriate ways. I'm seeking contacts with people who might be able to advise me on how to cleanup my implementation to reduce the amount of #ifdef __PDP10_H__ I've sprinkled liberally throughout the source. Also, if its possible to get simple changes made to prevent breaking my PDP10 version and that are otherwise innocuous that would be wonderful. For example, the PDP10 word size is 36 bits; Fairly recently people have taken to writing code that assumes word size is a power of 2 even when it's straightforward to write in a manner that doesn't make that assumption. Considering the large number of files customized to get the PDP10 compiler working, I'm not sure whether it's possible to get it to build directly from the gcc trunk, but it would be nice to work toward that goal. Some other things which distinguish the PDP10 architecture from assumptions in the gcc code base include: its variety of formats of pointers only one of which can be viewed as an integer and that one is capable of referencing only word aligned data, a functional difference between signed and unsigned integers, and peculiarities to the use of PDP10 byte arrays which are very difficult to describe. Any help or advise would be appreciated. Martin Chaney XKL, LLC
Re: GCC Port (gcc backend) for Microchip PICMicro microcontroller
On Apr 11, 2006, at 03:46, Colm O' Flaherty wrote: I'm not quite sure I follow you.. if its possible to dedicate a register to act as the data-stack pointer, and implement it that way, why would I want to keep the SP as a virtual register? I'm not being antagonistic when I say that.. I'm just trying to understand what you're trying to tell me.. Sorry, thought you were indicating that you didn't WANT a data stack :-) Now I understand that your chip just doesn't provide hardware support for stacks. BTW, another port I did (to a RISCy architecture that was the core for a high-speed multiprotocol router (never submitted to the FSF and the company's now belly-up) provides a lower bound on how simple an addressing scheme you can deal with. This machine had 512 directly addressable memory locations and 4 registers that could be indirected thru (kinda like the way the PDP-8 worked). With 3 registers available for reload (1 was reserved for the SP), you could pretty much compile all the GCC test suite that didn't need more than 512 words of memory. [Oh, BTW, chars were 32 bits on this machine also] Will check out the ip2k port again.. the last time I looked, I was blinded by the assumption that if the usual stack macros were defined in a straightforward fashion, that the target actually supported (or implemented) a stack... It ain't necessarily so. you might be able to keep the SP as a virtual register and make sure that code generation never tries to actually use it
Re: GCC Port (gcc backend) for Microchip PICMicro microcontroller
Again, the GCC3 distribution has a port of the IP2K microcontroller. It has a hardware call stack, but the data stack is implemented entirely in software. You will have to dedicate a register to act as the data-stack pointer. I suppose if you limit yourself to writing functions with NO stack-local data you might be able to keep the SP as a virtual register and make sure that code generation never tries to actually use it. You will also be severely limited in the ability to pass parameters if you only allow register parameters with no parameter saving. At this point, why bother writing a C compiler On Apr 10, 2006, at 03:54, Colm O' Flaherty wrote: Does anyone have any ideas about what gcc support is like for targets with no data stack? The 14 bit cores (16F) mostly have a 2-8 level hardware stack, which is not part of the program or data memory, and is not addressable. There is no data stack. I'm hoping that there is an existing backend architecture where there is no stack, so that I can have a peep to see how the code fakes stack support, but so far, all the obvious candidates (the microcontrollers) seem to have a stack. Ideas, anyone? Colm
Re: GCC port for V8-uRISC (8 bit CPU)
I participated in a port to an 8-bit internet toaster 4 years ago (the Ubicom IP2k chip). It's distributed as part of the gcc-3.x releases, but has been dropped from the gcc-4.x distributions. The IP2k was a very restrictive environment, and it took a lot of work to get it to generate really tight code. I'd definely suggest looking at gcc/config/ip2k to see how we did it -Original Message- From: Nemanja Popov [EMAIL PROTECTED] Sent: Apr 5, 2006 9:50 AM To: gcc@gcc.gnu.org Subject: GCC port for V8-uRISC (8 bit CPU) Hi, Can somebody please explain to me is it reasonable and possible to port gcc (version 4.xx) to 8 bit cpu architecture. I would appreciate precise explanation why it is possible or not. CPU is V8-uRISC. V8-uRISC Features are: 8-bit ALU 64K byte addressing capability Accumulator (R0) Seven 8-bit General Purpose Registers (R1-R7) Multiple register banks are easily implemented 16-bit Program Counter and Stack Pointer Thanks in advance for all informations. Regards, Nemanja Popov
Crazy ICE from gcc 4.1.0
I've built a generic 4.1.0 for RH7.3 x86 linux (I did a make bootstrap) Compiling a rather large file, I get tmp.f_00.cxx:26432: error: unrecognizable insn: (insn 173 172 174 9 (set (reg:QI 122) (const_int 128 [0x80])) -1 (nil) (nil)) tmp.f_00.cxx:26432: internal compiler error: in extract_insn, at recog.c:2020 Which looks insane, because there's a perfectly good define_insn (cf *movqi_1 in i386.md) I'm trying to reduce this to a reasonably sized test case (and I'm going to try debugging this in the recognizer), but I can't see why this instruction isn't matching the 2nd constraint alternative and just producing a movb r,#128 (define_insn *movqi_1 [(set (match_operand:QI 0 nonimmediate_operand =q,q ,q ,r,r ,?r,m) (match_operand:QI 1 general_operand q,qn,qm,q,rn,qm,qn))] GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM { switch (get_attr_type (insn)) { case TYPE_IMOVX: gcc_assert (ANY_QI_REG_P (operands[1]) || GET_CODE (operands[1]) == MEM); return movz{bl|x}\t{%1, %k0|%k0, %1}; default: if (get_attr_mode (insn) == MODE_SI) return mov{l}\t{%k1, %k0|%k0, %k1}; else return mov{b}\t{%1, %0|%0, %1}; } }
Re: Bug in PPC inline assembly?
On Jul 17, 2005, at 19:15, Stefan wrote: I have some problems with using inline PowerPC assembly in GCC (4.0.1). Consider the following code: void save_fp_register(double* buffer) { asm(stfd F0, 0(%0) : : r (buffer) ); } Try using 'b' for the constraint - that selects for an address base register, as opposed to 'r' that is any of the general registers (including R0)