GCC-4.3.0 fails to compile SPECint-2006 with control speculation on itanium processor
Hi: I am working on gcc-4.3.0 and Redhat ES 4. When I uses the compiler to build specint-2006 benchmarks, none passes the make with compiler option: -msched-control-spec (enable control speculation on IA-64) Here is part of the error log: # Error 400.perlbench: Error with make! # # Error 401.bzip2: Error with make! # # Error 403.gcc: Error with make!# # Error 429.mcf: Error with make!# # Error 445.gobmk: Error with make! # # Error 456.hmmer: Error with make! # # Error 458.sjeng: Error with make! # # Error 462.libquantum: Error with make! # # Error 464.h264ref: Error with make!# # Error 471.omnetpp: Error with make!# # Error 473.astar: Error with make! # # Error 483.xalancbmk: Error with make! # So any help ? Thanks
Fwd: GCC-4.3.0 fails to compile SPECint-2006 with control speculation on itanium processor
-- Forwarded message -- From: 吴曦 [EMAIL PROTECTED] Date: 2008/4/11 Subject: Re: GCC-4.3.0 fails to compile SPECint-2006 with control speculation on itanium processor To: Eljay Love-Jensen [EMAIL PROTECTED] 2008/4/11 Eljay Love-Jensen [EMAIL PROTECTED]: Hi 吴曦, What version of GNU Make are you using? make --version Is it at least GNU Make 3.80? --Eljay [EMAIL PROTECTED] benchspec]$ make --version GNU Make 3.80 Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Fwd: GCC-4.3.0 fails to compile SPECint-2006 with control speculation on itanium processor
-- Forwarded message -- From: 吴曦 [EMAIL PROTECTED] Date: 2008/4/11 Subject: Re: GCC-4.3.0 fails to compile SPECint-2006 with control speculation on itanium processor To: Eljay Love-Jensen [EMAIL PROTECTED] I turn on the verbose mode of spec, it really fails to compile the code. Something like internal compiler error, etc. It seems that this support is rather inmature 2008/4/11 吴曦 [EMAIL PROTECTED]: 2008/4/11 Eljay Love-Jensen [EMAIL PROTECTED]: Hi 吴曦, What version of GNU Make are you using? make --version Is it at least GNU Make 3.80? --Eljay [EMAIL PROTECTED] benchspec]$ make --version GNU Make 3.80 Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Re: Scheduling problem - A more detailed explain
2007/10/11, Jim Wilson [EMAIL PROTECTED]: Thanks for you helpful hints ! And I am sorry for such a late reply. I have figured out this problem yesterday :-). Do we know for sure that the scheduler is failing here? Have you looked at -da RTL dumps to verify which pass is performing the incorrect optimization? I use the method you mentioned above to find the problem, the scheduling code that GCC used is correct, but there are some errors with the order of my instrumentation INSN list. So... Currently, gcc only emits these pr reg group save/restores in the prologue and epilogue, and we have scheduling barriers after/before the prologue/epilogue, so it is possible that there is a latent problem here which has gone unnoticed simply because it is impossible to reproduce with unmodified FSF gcc sources. Previously, I also doubt that it is a latent problem, however, I read the code you mentioned and find it is correct with pr reg group save/restore, and finally find it is due to my carelessness in the instrumentation code; so, sorry for that :-).
Re: Scheduling problem - A more detailed explain
rws_access_reg should be handling this correctly. It uses HARD_REGNO_NREGS to get the number of regs referred to by a reg rtl. So it should return 64 in this case, and then it will iterate over all 64-bit PR regs when checking for a dependency. I have found HARD_REGNO_NREGS in ia64.h #define HARD_REGNO_NREGS(REGNO, MODE) \ ((REGNO) == PR_REG (0) (MODE) == DImode ? 64 \ As you stated above, it returns 64 for DImode pr0. We already have support for these move instructions. See the movdi_internal pattern. Since there are 64 1-bit PR registers, we use a DImode reference to pr0 to represent the entire set of PR registers. Is this the RTL that you are using? Or do you have your own representation? If different, what RTL are you using? I generate this mov instruction like this: gen_movdi(gen_rtx_REG(DImode, PR_REG(0)), X); here X is a general register. Further, I dump the rtl list and found the generated insn, I think it is correct: (insn 522 551 523 2 (set (reg:DI 8 r8) (reg:DI 256 p0)) -1 (nil) (nil)) and (insn 537 377 535 2 (set (reg:DI 256 p0) (reg:DI 8 r8)) -1 (nil) (nil)) but the scheduling really produces wrong code :- Besides, I am using GCC-4.1.1 and I found rws_access_reg which handles the dependencies for these mov instructions ... static int rws_access_reg (rtx reg, struct reg_flags flags, int pred) { int regno = REGNO (reg); int n = HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)); if (n == 1) return rws_access_regno (regno, flags, pred); else { int need_barrier = 0; while (--n = 0) need_barrier |= rws_access_regno (regno + n, flags, pred); return need_barrier; } } Well... Is there anything I miss or forget to do ? Thanks
Question on GGC
Hi. I have several global variables which are of type rtx. They are used in flow.c ia64.c and final.c. As stated in the internal doc with types. I add GTY(()) marker after the keyword 'extern'. for example: extern GTY(()) rtx a; these 'extern's are added in regs.h which is included in flow.c ia64.c and final.c However, I init 'a' at ia64_compute_frame which is defined in ia64.c but found 'a' incorrectly collected by ggc_collect. (I watch the memory location which is allocated for a, and found it is collected by GGC. Is there any thing I forget to do ? Any help is truly appreciated Thanks :-)
Re: Question on GGC
2007/9/27, Zdenek Dvorak [EMAIL PROTECTED]: Hello, I have several global variables which are of type rtx. They are used in flow.c ia64.c and final.c. As stated in the internal doc with types. I add GTY(()) marker after the keyword 'extern'. for example: extern GTY(()) rtx a; these 'extern's are added in regs.h which is included in flow.c ia64.c and final.c However, I init 'a' at ia64_compute_frame which is defined in ia64.c but found 'a' incorrectly collected by ggc_collect. (I watch the memory location which is allocated for a, and found it is collected by GGC. Is there any thing I forget to do ? you need to add regs.h to GTFILES in Makefile.in. Zdenek Thanks. I found GCC generate the file gtype-desc.c to handle these definitions, But how does GCC find the corresponded definition for 'extern GTY(()) rtx a' ? Once I want to define the variable in a new file or extern this variable in a new header file, it will throw the error that 'the initializer is not constant' (I think the error is due to GCC can not correlate the 'extern' and 'definition'). Would you give more details on this problem, especially if want to define and extern this variable in new files.
Re: Question on GGC
Sorry, I found it in gccint, thanks :-) 2007/9/28, 吴曦 [EMAIL PROTECTED]: 2007/9/27, Zdenek Dvorak [EMAIL PROTECTED]: Hello, I have several global variables which are of type rtx. They are used in flow.c ia64.c and final.c. As stated in the internal doc with types. I add GTY(()) marker after the keyword 'extern'. for example: extern GTY(()) rtx a; these 'extern's are added in regs.h which is included in flow.c ia64.c and final.c However, I init 'a' at ia64_compute_frame which is defined in ia64.c but found 'a' incorrectly collected by ggc_collect. (I watch the memory location which is allocated for a, and found it is collected by GGC. Is there any thing I forget to do ? you need to add regs.h to GTFILES in Makefile.in. Zdenek Thanks. I found GCC generate the file gtype-desc.c to handle these definitions, But how does GCC find the corresponded definition for 'extern GTY(()) rtx a' ? Once I want to define the variable in a new file or extern this variable in a new header file, it will throw the error that 'the initializer is not constant' (I think the error is due to GCC can not correlate the 'extern' and 'definition'). Would you give more details on this problem, especially if want to define and extern this variable in new files.
Re: support single predicate set instructions in GCC-4.1.1
2007/9/26, Jim Wilson [EMAIL PROTECTED]: On Tue, 2007-09-25 at 15:13 +0800, 吴曦 wrote: propagate_one_insn), I don't understand why GCC fails the computation of liveness if there is no optimization flag :-(. There is probably something else happening with -O that is recomputing some liveness or CFG info. For instance, the flow2 pass will call split_all_insns and cleanup_cfg, but only with -O. You could try selectively disabling other optimization passes to determine which one is necessary in order for your code to work. Actually, looking closer, I see several of them call update_life_info. regrename for instance has two update_life_info calls. Another possibility here is to try calling recompute_reg_usage instead of doing it yourself. Or maybe calling just update_life_info directly, if you need different flags set. FYI This stuff is all different on mainline since the dataflow merge. I'm assuming you are using gcc-4.2.x. -- Jim Wilson, GNU Tools Support, http://www.specifix.com Thanks, it's the problem of pass_stack_adjustments.
Re: support single predicate set instructions in GCC-4.1.1
2007/9/25, Jim Wilson [EMAIL PROTECTED]: ÎâêØ wrote: (define_insn *shift_predicate_cmp [(set (const_int 0) (and:BI (and:BI (match_operand:BI 1 register_operand c) (and:BI (match_operand:DI 2 gr_reg_or_8bit_adjusted_operand rL) (match_operand:DI 3 gr_register_operand r))) (match_operand:BI 0 register_operand c)))] (%0) cmp.ne %1, p0 = %2, %3 [(set_attr itanium_class icmp)]) it warns WAW and there should be stop ;; between these two instructions. It is the assembler that is giving the warning. The assembler knows that the %1 operand is modified by the instruction, but the compiler does not, because the %1 operand is not a SET_DEST operand. Your SET_DEST is (const_int 0) which is useless info and incorrect. You need to make sure that the RTL is an accurate description of what the instruction does. Besides the problem with the missing SET_DEST, there is also the problem that you are using AND operands for a compare, which won't work. AND and NE are not interchangeable operations. Consider what happens if you compare 0x1 with 0x1. cmp.ne returns false. However, AND returns 0x1, which when truncated from DImode to BImode is still 0x1, i.e. true. So the RTL does not perform the same operation as the instruction you emitted. This could confuse the optimizer. GCC internals assume that predicate registers are always allocated in pairs, and that the second one is always the inverse of the first one. Defining a special pattern that only modifies one predicate register probably isn't gaining you much. If you are doing this before register allocation, then you are still using 2 predicate registers, as the register allocator will always give you 2 even if you only use one. Worst case, if this pattern is exposed to the optimizer, then the optimizer may make changes that break your assumptions. It might simplify a following instruction by using the second predicate reg for instance, which then fails at run-time because you didn't actually set the second predicate reg. If you are only using this in sequences that the optimizer can't rewrite, then you should be OK. -- Jim Wilson, GNU Tools Support, http://www.specifix.com Thanks so much for your helpful hints. I think I need to write more details about why I need this kind of instruction. But before that, there is another problem on the liveness calculation (this problem occurs when I use my new GCC to compile some source with no optimization flag). Roughly speaking, my work is to instrument sensitive instructions to do information flow tracking. I did this work after register allocation and just before the second scheduling phase (As I need to intercept all memory access, so I choose to do this work after register allocation). Instead of reserving certain number of registers in backend to do instrumentation, I choose to allocate registers for it. As the register allocation is done, I need compute liveness information manually for each sensitive insn to get a set of registers that I can use without any save or restore. To do this, I borrow code from the function propagte_block which is defined in flow.c, more specifically, the code is: struct propagate_block_info *pbi; int changed, flags; rtx insn, prev; bitmap insn_live_in, insn_live_out; bitmap bb_live_in, bb_live_out; basic_block cur_bb; flags = PROP_DEATH_NOTES; flags = ~(PROP_SCAN_DEAD_CODE | PROP_SCAN_DEAD_STORES | PROP_KILL_DEAD_CODE); insn_live_in = BITMAP_ALLOC(NULL); insn_live_out = BITMAP_ALLOC(NULL); FOR_EACH_BB (cur_bb) { bb_live_in = cur_bb-il.rtl-global_live_at_start; bb_live_out = cur_bb-il.rtl-global_live_at_end; bitmap_copy (insn_live_out, bb_live_out); bitmap_copy (insn_live_in, insn_live_out); pbi = init_propagate_block_info (cur_bb, insn_live_in, NULL, NULL, flags); if (flags PROP_REG_INFO) { unsigned i; reg_set_iterator rsi; /* Process the regs live at the end of the block. Mark them as not local to any one basic block. */ EXECUTE_IF_SET_IN_REG_SET (insn_live_in, 0, i, rsi) REG_BASIC_BLOCK (i) = REG_BLOCK_GLOBAL; } changed = 0; for (insn = BB_END (cur_bb); ; insn = prev) { bitmap_clear (shift_usable); /* If this is a call to `setjmp' et al, warn if any non-volatile datum is live. */ if ((flags PROP_REG_INFO) CALL_P (insn) find_reg_note (insn,
support tnat instruction on IA-64. error occurs in bundling. help
Hi I am working on IA-64 and GCC-4.1.1 I modify ia64.md to support tnat instruction. More specifically, I add the following define_insn: (define_insn shift_tnat [(set (match_operand:BI 0 register_operand =c) (unspec:BI [(match_operand:DI 1 gr_register_operand r)] UNSPEC_TNAT))] tnat.nz %0, %I0 = %1 [(set_attr itanium_class tnat)]) add one line in define_attr type unknown,A,I,M,F,B,L,X,S thus: ;; chk_s has an I and an M form; use type A for convenience. (define_attr type unknown,A,I,M,F,B,L,X,S (cond [(eq_attr itanium_class ld,st,fld,fldp,stf,sem,nop_m) (const_string M) (eq_attr itanium_class rse_m,syst_m,syst_m0) (const_string M) (eq_attr itanium_class frar_m,toar_m,frfr,tofr) (const_string M) (eq_attr itanium_class lfetch) (const_string M) (eq_attr itanium_class chk_s,ialu,icmp,ilog,mmalua) (const_string A) (eq_attr itanium_class fmisc,fmac,fcmp,xmpy) (const_string F) (eq_attr itanium_class fcvtfx,nop_f) (const_string F) (eq_attr itanium_class tnat) (const_string I) ~~~ tnat instruction is emit on Integer unit. (eq_attr itanium_class frar_i,toar_i,frbr,tobr) (const_string I) (eq_attr itanium_class frpr,topr,ishf,xtd,tbit) (const_string I) (eq_attr itanium_class mmmul,mmshf,mmshfi,nop_i) (const_string I) (eq_attr itanium_class br,scall,nop_b) (const_string B) (eq_attr itanium_class stop_bit) (const_string S) (eq_attr itanium_class nop_x) (const_string X) (eq_attr itanium_class long_i) (const_string L)] (const_string unknown))) and one value in attribute itanium_class thus: (define_attr itanium_class unknown,ignore,stop_bit,br,fcmp,fcvtfx,fld, fldp,fmac,fmisc,frar_i,frar_m,frbr,frfr,frpr,ialu,icmp,ilog,ishf, ld,chk_s,tnat,long_i,mmalua,mmmul,mmshf,mmshfi,rse_m,scall,sem,stf, st,syst_m0, syst_m,tbit,toar_i,toar_m,tobr,tofr,topr,xmpy,xtd,nop, nop_b,nop_f,nop_i,nop_m,nop_x,lfetch,pre_cycle (const_string unknown)) I also modify ia64.c to support UNSPEC_TNAT, in function rtx_needs_barrier. case UNSPEC_FR_SPILL: case UNSPEC_FR_RESTORE: case UNSPEC_GETF_EXP: case UNSPEC_SETF_EXP: case UNSPEC_ADDP4: case UNSPEC_FR_SQRT_RECIP_APPROX: case UNSPEC_TNAT: /* support tnat instruction */ if(XINT(x, 1) == UNSPEC_TNAT) { print_rtl_single (stderr, x); fflush(stderr); } need_barrier = rtx_needs_barrier (XVECEXP (x, 0, 0), flags, pred); break; However, when I use the new GCC to compile the following function long ga[20] = {0, }; intgb[20] = {0, }; char gc[20] = {0, }; shortgd[20] = {0, }; void test_leaf_function() { fprintf(stderr, in function test_leaf_function\n); if(ga[0] != 0) { ga[0] = 20; ga[0] = gd[1]; } ga[0] = 100; ga[0] = 0; if (gb[0] != 5) ga[0] = gb[0]; else ga[0] = gb[1]; ga[3] = ga[2]+gc[1]; gc[0] = 0; gd[0] = 0; } it reports the error: error insn: (insn 163 185 312 0 giftlib_test.c:90 (set (reg:BI 263 p7) (unspec:BI [ (reg/f:DI 14 r14 [376]) ] 32)) 300 {shift_tnat} (insn_list:REG_DEP_ANTI 187 (insn_list:REG_DEP_ANTI 178 (insn_list:REG_DEP_ANTI 179 (insn_list:REG_DEP_ANTI 181 (insn_list:REG_DEP_ANTI 186 (insn_list:REG_DEP_OUTPUT 176 (insn_list:REG_DEP_TRUE 77 (nil (nil)) giftlib_test.c: In function foo giftlib_test.c:99: internal compiler error: in bundling, at config/ia64/ia64.c:7457 Please submit a full bug report, with preprocessed source if appropriate. See URL:http://gcc.gnu.org/bugs.html for instructions. I follow the error at ia64.c: 7457, a assertion fails /* Move the position backward in the window. Group barrier has no slot. Asm insn takes all bundle. */ if (INSN_CODE (insn) != CODE_FOR_insn_group_barrier GET_CODE (PATTERN (insn)) != ASM_INPUT asm_noperands (PATTERN (insn)) 0) pos--; /* Long insn takes 2 slots. */ if
Re: support tnat instruction on IA-64. error occurs in bundling. help
2007/9/26, Jim Wilson [EMAIL PROTECTED]: ÎâêØ wrote: [(set_attr itanium_class tnat)]) The itanium_class names are based on info from the Itanium Processor Microprocessor Reference by the way. I believe the problem is that you didn't add info to the DFA scheduler dscriptions in the itanium1.md and itanium2.md files for this new instruction class. Normally the DFA scheduler info is optional. However, for itanium, we also use the scheduler for bundling, and hence proper DFA scheduler info for each instruction class is required. It appears that the tnat instruction schedules and bundles the same as the tbit instruction, so just use the existing tbit class instead of trying to add a new one. The docs are a bit unclear here though, since some places mention tbit and tnat, and other places just mention tbit. For your purposes, this isn't important. Modifying the DFA scheduler descriptions is complicated. It is best to avoid that if you can. Specifying that tnat is an I type instruction isn't enough for bundling purposes, since a lot of instructions have further restrictions. In this case, for instance, tnat can only go into an I0 slot, not an I1 slot. This detail is handled in the DFA scheduler descriptions. -- Jim Wilson, GNU Tools Support, http://www.specifix.com Truly thanks, I have discovered this problem after I sent the first mail, and I found itanium1.md and itanium2.md describe the pipeline hazard, but they are really complex... :-(. Is there any guide or docs on this? thanks However, I have adjusted tnat class to tbit, and it seems working now. Thanks again
support single predicate set instructions in GCC-4.1.1
Hi. I am working on Itanium architecture and GCC-4.1.1. I modify the machine description file ia64.md to support single predicate set instruction such as: (%0) cmp.ne %1, p0 = %2, %3 here %0 and %1 are predicates, %2 is a register or immediate, %3 is a register operand. more specifically, I add the following define_insn: (define_insn *shift_predicate_cmp [(set (const_int 0) (and:BI (and:BI (match_operand:BI 1 register_operand c) (and:BI (match_operand:DI 2 gr_reg_or_8bit_adjusted_operand rL) (match_operand:DI 3 gr_register_operand r))) (match_operand:BI 0 register_operand c)))] (%0) cmp.ne %1, p0 = %2, %3 [(set_attr itanium_class icmp)]) I make this define_insn anonymous because I only need this type of instruction sometimes, and I can generate this pattern manually; the generation function is: rtx gen_shift_predicate_cmp (rtx op0, rtx op1, rtx op2, rtx op3) { return gen_rtx_SET(BImode, CONST0_RTX(BImode), gen_rtx_AND(BImode, gen_rtx_AND(BImode, op1, gen_rtx_AND(BImode, op2, op3)), op0)); } After adding these support, I recompile gcc and insert some instructions of this kind into insn list, the generation and matching works fine. BUT, the generation of insn group barrier (';;' on Itanium architecture) doesn't work fine, and it generates code like: .loc 1 118 0 (p0) cmp.ne p15, p0 = 0, r30 .loc 1 121 0 (p0) cmp.ne p15, p0 = 0, r30 it warns WAW and there should be stop ;; between these two instructions. Intuitively, I think I should modify some part of GCC to generate correct ;; as there is a new type of define_insn. But I don't know where exactly to do this modification to correct the error :-(, any help ? Any help is truely appreciated ! Thanks very much
About allocating registers for instrumentation
Hi, I am working on gcc-4.1.1 and Itanium architecture. Current now I have finished instrumenting ld and st instructions before the second scheduling pass by reserving two global registers at backend. However, in order to enhance the performance (e.g. make the scheduling better), I choose to allocate two registers for each instrumentation instead of using the reserved ones. To identify which registers I can use for each ld and st instruction, I follow the following idea: For each insn, I compute its live-in and live-out by starting from the basic-block: as we can get the live-in of the basic-block, then, for INSN(N) in the basic-block, (1) live-in[ INSN(N) ] = live-out [ INSN(N-1) ] (2) live-out[ INSN(N) ] = (live-in [ INSN(N) ] U set) -(REG_DEAD U REG_UNUSED) where set is the set of registers set by the insn, and REG_DEAD, REG_UNUSED can be got from the insn notes. Then, R-( live-in[INSN(N)] U live-out[INSN(N)] ) is the set of registers I can use to instrument INSN(N). (here R is a set of registers I specified, for example, all the caller-save global general registers) Am I right? or is there any thing I mis-understand, if any, please point out, thanks! Further, how to identify SET in (1) ? I have found many of the insns just before the second scheduling have only one set in it. If this is hold for all insns, I think I can use the single_set to get SET. Is there any exception for that? thanks again Wu
Re: How to make use of instruction scheduling to improve performance?
2007/7/29, 吴曦 [EMAIL PROTECTED]: 28 Jul 2007 12:16:51 -0700, Ian Lance Taylor [EMAIL PROTECTED]: 吴曦 [EMAIL PROTECTED] writes: 28 Jul 2007 09:04:01 -0700, Ian Lance Taylor [EMAIL PROTECTED]: 吴曦 [EMAIL PROTECTED] writes: there are some questions after I read the source code today. 1st. if I add the instrumentation before 2nd scheduling; will gcc emit an insn which will be output as a ld instruction later? If this could happen, some ld instruction may not be instrumented... No, gcc won't introduce any new memory load or store instructions after the prologue and epilogue instructions are threaded. It may ~~~ when are prologue and epilogue instructions threaded? (after register allocation? besides, what is the exact meaning of prologue and epilogue instructions are threaded? Would you mind explaining in more detail? thx :-)) If you look in gcc/passes.c you will see the list of passes. The prologue and epilogue instructions are threaded in pass_thread_prologue_and_epilogue. This happens after register ~ Sorry, I didn't find that pass in gcc 4.1.1. This pass is added in the newest gcc? thx. allocation. It means that the prologue and epilogue instructions are ~~ As you have indicated, this pass happens after register allocation, I want to allocate register rather than dedicating register to do the instrumentation calculation, are there any hints to do this? added to the RTL, so that the second scheduling pass can see them. still move them around or eliminate them, though. ~~ emmm, I need to move/remove my instrumentation if necessary... Yes. This is true by definition, since you want to instrument before the second scheduling pass. The scheduler can and will move load and store instructions. You need to set up the dependencies so that your instrumentation will still occur at the right time. 2nd. to identify ld/st instruction (memory access op), I want to modify gen_rtx_SET, the method is that, if I find SRC or DST is an memory operand in gen_rtx_SET, then add instrumentation code before and after the insn to emit. Will this method work? Besides, if some false positives occur, how to correct them (I don't have some very clear idea.) Modifying gen_rtx_SET is probably not the right way to go. That is ~ Then, what about modifying machine description file? Add define_expand for the define_insn which will output ld/st instruction (this define_expand can insert instrumentation insns. Of course, I need to identify the operands to the define_expand contains a memory operand and a reg operand.) That will work in some sense, but if a load or store instruction is eliminated you are quite likely to still have the instrumentation instructions lying around. Ian Thanks for your hints. rest_of_handle_flow2 calls thread_prologue_and_epilogue_insns, maybe I need to move to a newer version of gcc
Re: How to make use of instruction scheduling to improve performance?
2007/7/28, Ramana Radhakrishnan [EMAIL PROTECTED]: Hi, On 7/28/07, 吴曦 [EMAIL PROTECTED] wrote: I am working on gcc 4.1.1 and itanium2 architecture. I instrumented each ld and st instruction in final_scan_insn() by looking at the insn template (These instrumentations are used to do some security checks). These instrumentations incur high performance overhead when running specint benchmarks. However, these instrumentations contain high dependencies between instructions so that I want to use instruction scheduling to improve the performance. In the current implementation, the instrumentations are emitted as assembly instructions (not insns). What should I do to make use of the instruction scheduler? If I understand your description, you are adding instrumentation code, and you want to expose that code to the scheduler. What you need to do in that case is to add the code as RTL instructions before the scheduling pass runs. You will need to figure out the RTL which will do what you want. Then you will need to insert it around the instructions which you want to instrument. You will probably want to ~ Before the second scheduling pass, how to identify that one insn will be output as a load instruction (or store instruction)? In the final, i use get_insn_template() to do this matching. Can I use the same method before the second scheduling pass? If not, would you mind giving some hints? thx Please send followups to the mailing list, not just to me. Thanks. You should just match on the RTL. I don't know enough about the Itanium to tell you precisely what to look for. But, for example, you might look for s = single_set (PATTERN (insn)); if (s != NULL (MEM_P (SET_SRC (s) || MEM_P (SET_DEST (s) ... Ian Thanks. I observe that the 2nd instruction scheduling happens after the local and global allocation. However, in my instrumentation, I need several registers to do computation, can I allocate registers to do computation in the instrumentation code just before the 2nd instruction scheduling? If so, would you mind giving some hints on the interfaces that I could make use of. Generally you should be able to create new temporaries for such calculations before register allocation / reload . Otherwise you might have to resort to reserving a couple of registers in your ABI for such computations if you wanted these generated after reload (you could have a split that did that after reload but where in the function do you want to insert the instrumentation code ?) From what you are indicating - there isn't enough detail about where ~ in the function body you are inserting such instrumentation code - thx, As I have in indicated, I want to add instrumentations for each ld and st instruction in one function on itanium. (In my current implementation, I also instrument cmp and mv instructions on itanium). for example, for a ld instruction in the original program: ld rX=[rY] I want to instrument it as instrumentation prologue ld rX=[rY] instrumentation epilogue currently, to identify such ld instruction, I put my instrumentation in final, and use get_insn_template() to see what instruction this insn will be output as. To summarize, as I want to expose my instrumentation to instruction scheduling, following work should be done: 1. identify that one insn will be output as a ld instruction 2. allocate register to do the instrumentation calculation (in my current implementation, I use dedicated register to do this.) 3. emit the prepared instrumentation insn If you are doing such instrumentation in the prologue or epilogue of a function, you could choose to use gen_reg_rtx to obtain a temporary register. So typically obtain a temporary register in the following manner rtx tmp_reg = gen_reg_rtx (machinemode); Use the tmp_reg in whatever instruction you want to generate using the corresponding register as one of the operands . For these you might want to use the corresponding gen_*** named functions . cheers Ramana Besides, what happens if I move the insertion of instrumentation before register allocation, or even before the 1st scheduling pass, can I identify load/store instructions that early? -- Ramana Radhakrishnan Thanks for your hints.
Re: How to make use of instruction scheduling to improve performance?
2007/7/28, 吴曦 [EMAIL PROTECTED]: 2007/7/28, Ramana Radhakrishnan [EMAIL PROTECTED]: Hi, On 7/28/07, 吴曦 [EMAIL PROTECTED] wrote: I am working on gcc 4.1.1 and itanium2 architecture. I instrumented each ld and st instruction in final_scan_insn() by looking at the insn template (These instrumentations are used to do some security checks). These instrumentations incur high performance overhead when running specint benchmarks. However, these instrumentations contain high dependencies between instructions so that I want to use instruction scheduling to improve the performance. In the current implementation, the instrumentations are emitted as assembly instructions (not insns). What should I do to make use of the instruction scheduler? If I understand your description, you are adding instrumentation code, and you want to expose that code to the scheduler. What you need to do in that case is to add the code as RTL instructions before the scheduling pass runs. You will need to figure out the RTL which will do what you want. Then you will need to insert it around the instructions which you want to instrument. You will probably want to ~ Before the second scheduling pass, how to identify that one insn will be output as a load instruction (or store instruction)? In the final, i use get_insn_template() to do this matching. Can I use the same method before the second scheduling pass? If not, would you mind giving some hints? thx Please send followups to the mailing list, not just to me. Thanks. You should just match on the RTL. I don't know enough about the Itanium to tell you precisely what to look for. But, for example, you might look for s = single_set (PATTERN (insn)); if (s != NULL (MEM_P (SET_SRC (s) || MEM_P (SET_DEST (s) ... Ian Thanks. I observe that the 2nd instruction scheduling happens after the local and global allocation. However, in my instrumentation, I need several registers to do computation, can I allocate registers to do computation in the instrumentation code just before the 2nd instruction scheduling? If so, would you mind giving some hints on the interfaces that I could make use of. Generally you should be able to create new temporaries for such calculations before register allocation / reload . Otherwise you might have to resort to reserving a couple of registers in your ABI for such computations if you wanted these generated after reload (you could have a split that did that after reload but where in the function do you want to insert the instrumentation code ?) From what you are indicating - there isn't enough detail about where ~ in the function body you are inserting such instrumentation code - thx, As I have in indicated, I want to add instrumentations for each ld and st instruction in one function on itanium. (In my current implementation, I also instrument cmp and mv instructions on itanium). for example, for a ld instruction in the original program: ld rX=[rY] I want to instrument it as instrumentation prologue ld rX=[rY] instrumentation epilogue currently, to identify such ld instruction, I put my instrumentation in final, and use get_insn_template() to see what instruction this insn will be output as. To summarize, as I want to expose my instrumentation to instruction scheduling, following work should be done: 1. identify that one insn will be output as a ld instruction 2. allocate register to do the instrumentation calculation (in my current implementation, I use dedicated register to do this.) 3. emit the prepared instrumentation insn If you are doing such instrumentation in the prologue or epilogue of a function, you could choose to use gen_reg_rtx to obtain a temporary register. So typically obtain a temporary register in the following manner rtx tmp_reg = gen_reg_rtx (machinemode); Use the tmp_reg in whatever instruction you want to generate using the corresponding register as one of the operands . For these you might want to use the corresponding gen_*** named functions . cheers Ramana Besides, what happens if I move the insertion of instrumentation before register allocation, or even before the 1st scheduling pass, can I identify load/store instructions that early? -- Ramana
Re: How to make use of instruction scheduling to improve performance?
28 Jul 2007 09:04:01 -0700, Ian Lance Taylor [EMAIL PROTECTED]: 吴曦 [EMAIL PROTECTED] writes: there are some questions after I read the source code today. 1st. if I add the instrumentation before 2nd scheduling; will gcc emit an insn which will be output as a ld instruction later? If this could happen, some ld instruction may not be instrumented... No, gcc won't introduce any new memory load or store instructions after the prologue and epilogue instructions are threaded. It may ~~~ when are prologue and epilogue instructions threaded? (after register allocation? besides, what is the exact meaning of prologue and epilogue instructions are threaded? Would you mind explaining in more detail? thx :-)) still move them around or eliminate them, though. ~~ emmm, I need to move/remove my instrumentation if necessary... 2nd. to identify ld/st instruction (memory access op), I want to modify gen_rtx_SET, the method is that, if I find SRC or DST is an memory operand in gen_rtx_SET, then add instrumentation code before and after the insn to emit. Will this method work? Besides, if some false positives occur, how to correct them (I don't have some very clear idea.) Modifying gen_rtx_SET is probably not the right way to go. That is ~ Then, what about modifying machine description file? Add define_expand for the define_insn which will output ld/st instruction (this define_expand can insert instrumentation insns. Of course, I need to identify the operands to the define_expand contains a memory operand and a reg operand.) used in many places throughout the RTL passes. Not all of those places are going to be able to cope with the new instructions you want to add. Ian Thanks for your hints again :-)
How to make use of instruction scheduling to improve performance?
I am working on gcc 4.1.1 and itanium2 architecture. I instrumented each ld and st instruction in final_scan_insn() by looking at the insn template (These instrumentations are used to do some security checks). These instrumentations incur high performance overhead when running specint benchmarks. However, these instrumentations contain high dependencies between instructions so that I want to use instruction scheduling to improve the performance. In the current implementation, the instrumentations are emitted as assembly instructions (not insns). What should I do to make use of the instruction scheduler? Any help is truely appreciated! 3x
Re: How to make use of instruction scheduling to improve performance?
I am working on gcc 4.1.1 and itanium2 architecture. I instrumented each ld and st instruction in final_scan_insn() by looking at the insn template (These instrumentations are used to do some security checks). These instrumentations incur high performance overhead when running specint benchmarks. However, these instrumentations contain high dependencies between instructions so that I want to use instruction scheduling to improve the performance. In the current implementation, the instrumentations are emitted as assembly instructions (not insns). What should I do to make use of the instruction scheduler? If I understand your description, you are adding instrumentation code, and you want to expose that code to the scheduler. What you need to do in that case is to add the code as RTL instructions before the scheduling pass runs. You will need to figure out the RTL which will do what you want. Then you will need to insert it around the instructions which you want to instrument. You will probably want to ~ Before the second scheduling pass, how to identify that one insn will be output as a load instruction (or store instruction)? In the final, i use get_insn_template() to do this matching. Can I use the same method before the second scheduling pass? If not, would you mind giving some hints? thx Please send followups to the mailing list, not just to me. Thanks. You should just match on the RTL. I don't know enough about the Itanium to tell you precisely what to look for. But, for example, you might look for s = single_set (PATTERN (insn)); if (s != NULL (MEM_P (SET_SRC (s) || MEM_P (SET_DEST (s) ... Ian Thanks. I observe that the 2nd instruction scheduling happens after the local and global allocation. However, in my instrumentation, I need several registers to do computation, can I allocate registers to do computation in the instrumentation code just before the 2nd instruction scheduling? If so, would you mind giving some hints on the interfaces that I could make use of. Besides, what happens if I move the insertion of instrumentation before register allocation, or even before the 1st scheduling pass, can I identify load/store instructions that early?
Any hints on this problem? Thanks!
Hi, I am working on gcc-4.1.1 and Itanium architecure. Today I try to add a function call before each ld instruction. The method I use to achieve this goal is to modify final_scan_insn() in final.c: before calling get_insn_template, I add codes to check whether the insn matches a template that will emit ld instruction, then I use emit_library_call to emit new insns and output them by calling final_scan_insn() again. Now,the modified gcc is successfully builded and when I use it to compile a program, I observe that it successfully intercept each ld instruction and add the desired function call before them. But the problem comes, when I run the modified program compiled by the hacked gcc, it crashes due to segment fault. I use gdb to debug the program, and observe that the fault is due to this: originally, what I want to do is ld r14=[r14], and r14 contains the correct address, but in my inserted function call, say FOO, it modifies r14 to 0, and when the program returns from FOO and load from r14 again, it crashes, undoubtedly. Here is a concrete example, just a very simple one to illstrate the situation: ~~~ old code: main: ... ld r14=[r14] ... ~~~ ~~~ new code: FOO: ... mov r14=0 ... main: ... br.call FOO ld r14=[r14]/* CRASH! */ ... ~~~ Now, my question becomes clear. How to make my inserted function call not affect the orginal state of program? Further more, if I add more instructions (not only a function call), how can I keep the that state? Is there a general way to do this? Any hints on this problem will be *truely* appreciated. Thanks! Best Regards Andy.Wu
Re: Any hints on this problem? Thanks!
Make sure that the called function restores the original state of the program before it returns. Andreas. Thanks~. I know the goal is to restore the original state before the inserted function returns. BUT, how to? Is there any way to tell gcc: Hey, you should restore the original state before that function returns. I want hints on how to and the existing interfaces in gcc to do this :-).
Re: Any hints on this problem? Thanks!
Another solution is to add the instrumentation earlier, and use expand_call. Thanks for your hints. Is that means doing intrumentation at the RTL expand level? However, I have tried the following method, add a defined_expand in ia64.md, the template used in define_expand is the same as the one which will emit a ld instruction, just like this one: ~~ ;; expand the ld operation with check code if user turns on ;; fld-checking (define_expand gift_load_symptr_low [(set (match_operand:DI 0 register_operand =r) (lo_sum:DI (match_operand:DI 1 register_operand r) (match_operand 2 got_symbolic_operand s)))] { if(flag_ld_checking) { printf(gift_load_symptr_low emits checking function call\n); emit_library_call(gen_rtx_SYMBOL_REF(Pmode, \gift_check_bitmap\), 0, VOIDmode, 0); emit_insn (gen_rtx_SET (VOIDmode, operands[0], gen_rtx_LO_SUM (DImode, operands[1], operands[2]))); } DONE; }) ~~ BUT, when I use the newly builded compiler to compile my program, nothing matched to expand such ld instruction ...
Re: error: unable to generate reloads for..., any hints?
Thanks. But what does it mean by saying: Sometimes an insn can match more than one instruction pattern. Then the pattern that appears first in the machine description is the one used. in section 14.10 of gcc internal p259? 08 Feb 2007 00:09:21 -0800, Ian Lance Taylor [EMAIL PROTECTED]: 吴曦 [EMAIL PROTECTED] writes: I observe that there is a ld instruction in 3rd alternative, so I add a new define_insn before it in the hope that it will be matched firstly. It doesn't work that way. Your new instruction will wind up matching all move instructions. Reload will crash because the constraints don't work. Instead just change the existing movqi_internal insn. Don't try to write a new one. Change the existing insn to use C code which checks which_alternative instead of the @ list it uses now. Ian
error: unable to generate reloads for..., any hints?
Hi, I am working on gcc 4.1.1 and Itanium architecture. I want to modify the machine description of ia64.md to add some checks before each ld instruction. the following is the original define_insn: (define_insn *movqi_internal [(set (match_operand:QI 0 destination_operand =r,r,r, m, r,*f,*f) (match_operand:QI 1 move_operandrO,J,m,rO,*f,rO,*f))] ia64_move_ok (operands[0], operands[1]) @ mov %0 = %r1 addl %0 = %1, r0 ld1%O1 %0 = %1%P1 st1%Q0 %0 = %r1%P0 getf.sig %0 = %1 setf.sig %0 = %r1 mov %0 = %1 [(set_attr itanium_class ialu,ialu,ld,st,frfr,tofr,fmisc)]) I observe that there is a ld instruction in 3rd alternative, so I add a new define_insn before it in the hope that it will be matched firstly. (define_insn *ld_movqi_internal [(set (match_operand:QI 0 destination_operand =r) (match_operand:QI 1 move_operand m))] ia64_move_ok (operands[0], operands[1]) flag_check_ld { printf(define_insn ld_movqi_internal\n); return ld1%O1 %0 = %1%P1; } [(set_attr itanium_class ld)] I keep every thing the same as 3rd alternative in original define_insn except using C statement to return the desired output template. However, when I use the newly builded gcc to compile the following program, it crashes. #include stdio.h char characters[8192]={'a',}; int main() { char c = characters[0]; printf(Hello World! c:%c\n, c); } the error reported is: hi.c:9: error: unable to generate reloads for: (insn 10 9 12 1 (set (mem/c/i:QI (reg/f:DI 111 loc79) [0 c+0 S1 A128]) (reg:QI 14 r14 [orig:342 characters ] [342])) 3 {*gift_movqi_internal_ld} (nil) (expr_list:REG_DEAD (reg:QI 14 r14 [orig:342 characters ] [342]) (nil))) hi.c:9: internal compiler error: in find_reloads, at reload.c:3738 In IA64, the first pesudo register number is 334, thus register 111 and register 14 are both hardware registers. I looked at find_reloads at reload.c and find the following code fragement and comment: /* The operands don't meet the constraints. goal_alternative describes the alternative that we could reach by reloading the fewest operands. Reload so as to fit it. */ if (best == MAX_RECOG_OPERANDS * 2 + 600) { /* No alternative works with reloads?? */ if (insn_code_number = 0) fatal_insn (unable to generate reloads for:, insn); ... So, what is going on here? Especially, what is find_reloads going to finish and why it is going wrong here... I would appreciate any help on this question, thx! Best Regards --andy.wu
Re: Some hints on solving this problem?
Thanks for the hints, I have already noticed that insn list is match against the RTL templates to emit assembly code. However, I found 3 md files, ia64.md, itanium.md, itanium2.md, each file is very big... would you mind giving some hints on the differences between them? Especially, I am working on itanium2 architecture. (I have found many insn templates in ia64.md, but how about itanium.md and itanium2.md?) 在 07-2-4,Paul Yuan[EMAIL PROTECTED] 写道: 1) Modify the final() in final.c to emit some code before ld and st before outputting the assembly. 2) Modify the MD file. Find the template which generate ld or st, and add some code before ld and st. On 2/3/07, 吴曦 [EMAIL PROTECTED] wrote: Hi, I am working on gcc 4.1.1 and Itanium2 architecture. I want to use gcc to emit some code before each ld and st instruction (I know that using dynamic binary translator like PIN may be more suitable for this task, but I am on the way of studying gcc and want to use it to achieve this goal). But after several days of study, I find that the back-end of gcc too complex... :-( So, what is the best level in back-end to accomplish this task? I would appreciate any help I can get on this problem! thx! -- Paul Yuan www.yingbo.com
Some hints on solving this problem?
Hi, I am working on gcc 4.1.1 and Itanium2 architecture. I want to use gcc to emit some code before each ld and st instruction (I know that using dynamic binary translator like PIN may be more suitable for this task, but I am on the way of studying gcc and want to use it to achieve this goal). But after several days of study, I find that the back-end of gcc too complex... :-( So, what is the best level in back-end to accomplish this task? I would appreciate any help I can get on this problem! thx!
Level to do such a modification...
Hi, I am working on gcc 4.0.0. I want to use gcc to intercept each call to read, and taint the data readed in. For example: transform read(fd, buf, size) to read(fd, buf, size) if(is_socket(fd)) taint(buf, size) So, what is the best suitable level to do this modification in gcc? My own thought is in finish_function, before calling c_genericize,as I discovered that in c front-end, there's no GENERIC tree... In c_genericize, it directly calls gimplify_function_tree.
Re: Level to do such a modification...
I know valgrind, it is an emulator ,but we are restricted not to use an emulator. :-( 2007/1/24, Nicholas Nethercote [EMAIL PROTECTED]: On Wed, 24 Jan 2007, [GB2312] ÎâêØ wrote: I am working on gcc 4.0.0. I want to use gcc to intercept each call to read, and taint the data readed in. For example: transform read(fd, buf, size) to read(fd, buf, size) if(is_socket(fd)) taint(buf, size) So, what is the best suitable level to do this modification in gcc? My own thought is in finish_function, before calling c_genericize,as I discovered that in c front-end, there's no GENERIC tree... In c_genericize, it directly calls gimplify_function_tree. Are you sure you want to do this in GCC? You might find it easier to use a dynamic binary instrumentation framework such as Valgrind or Pin to do this kind of thing. Nick
Re: Level to do such a modification...
Anyway, the program is supervised...would you mind giving some advices with the compiler-based approach, after recompilation, I could finish this modification. 2007/1/24, Nicholas Nethercote [EMAIL PROTECTED]: On Wed, 24 Jan 2007, [GB2312] ÎâêØ wrote: I know valgrind, it is an emulator ,but we are restricted not to use an emulator. :-( Well, for some definition of emulator. Nick
passing arguments in emit_libraray_call
Hi, I want to use emit_library_call to output a library call to printf. The question is how to pass a format string argument? Also, in the comment of emit_library_call mentions: The rtx values should have been passed through protect_from_queue already. then, what should I do to pass the rtx values through protect_from_queue? Is there any doc or example to refer to?
Re: passing arguments in emit_libraray_call
sorry for that~, I am using gcc3.4.0. thanks for the hints on passing format string argument~ 在 07 Jan 2007 20:25:29 -0800,Ian Lance Taylor[EMAIL PROTECTED] 写道: 吴曦 [EMAIL PROTECTED] writes: I want to use emit_library_call to output a library call to printf. The question is how to pass a format string argument? See, e.g., how STRING_CST is handled in expand_expr_real_1. Also, in the comment of emit_library_call mentions: The rtx values should have been passed through protect_from_queue already. then, what should I do to pass the rtx values through protect_from_queue? Is there any doc or example to refer to? protect_from_queue is no longer used. On the other hand, you didn't mention which version you are using, and I don't see that comment in the current sources. So if you are using a version which still uses protect_from_queue, just grep for it in the source code. Ian
How to dedicate a register for special purpose in gcc?
Hi, How can I dedicate a register for special purpose, that means, the dedicated register only appears in the inserted code of my own, but never allocated in the rest of code. I have read some doc(gcc int) about the register usage but still have no idea. I would *really* appreciate any help I can get on this issue! Xi Wu