On Fri, Jan 12, 2018 at 9:47 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> gcc/ >> >> * config/i386/i386-opts.h (indirect_branch): New. >> * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise. >> * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone >> with local indirect jump when converting indirect call and jump. >> (ix86_set_indirect_branch_type): New. >> (ix86_set_current_function): Call ix86_set_indirect_branch_type. >> (indirectlabelno): New. >> (indirect_thunk_needed): Likewise. >> (indirect_thunk_bnd_needed): Likewise. >> (indirect_thunks_used): Likewise. >> (indirect_thunks_bnd_used): Likewise. >> (INDIRECT_LABEL): Likewise. >> (indirect_thunk_name): Likewise. >> (output_indirect_thunk): Likewise. >> (output_indirect_thunk_function): Likewise. >> (ix86_output_indirect_branch): Likewise. >> (ix86_output_indirect_jmp): Likewise. >> (ix86_code_end): Call output_indirect_thunk_function if needed. >> (ix86_output_call_insn): Call ix86_output_indirect_branch if >> needed. >> (ix86_handle_fndecl_attribute): Handle indirect_branch. >> (ix86_attribute_table): Add indirect_branch. >> * config/i386/i386.h (machine_function): Add indirect_branch_type >> and has_local_indirect_jump. >> * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump >> to true. >> (tablejump): Likewise. >> (*indirect_jump): Use ix86_output_indirect_jmp. >> (*tablejump_1): Likewise. >> (simple_return_indirect_internal): Likewise. >> * config/i386/i386.opt (mindirect-branch=): New option. >> (indirect_branch): New. >> (keep): Likewise. >> (thunk): Likewise. >> (thunk-inline): Likewise. >> (thunk-extern): Likewise. >> * doc/extend.texi: Document indirect_branch function attribute. >> * doc/invoke.texi: Document -mindirect-branch= option. >> >> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h >> index f245c1573cf..f14cbeee7a1 100644 >> --- a/gcc/config/i386/i386-opts.h >> +++ b/gcc/config/i386/i386-opts.h >> @@ -106,4 +106,12 @@ enum prefer_vector_width { >> PVW_AVX512 >> }; >> >> +enum indirect_branch { >> + indirect_branch_unset = 0, >> + indirect_branch_keep, >> + indirect_branch_thunk, >> + indirect_branch_thunk_inline, >> + indirect_branch_thunk_extern >> +}; > I think it would make sense to simply place the body of your introduction > email > here as a comment explaining what enum indirect_branhc does.
Will do. >> -/* Return true if a red-zone is in use. */ >> +/* Return true if a red-zone is in use. We can't use red-zone when >> + there are local indirect jumps, like "indirect_jump" or "tablejump", >> + which jumps to another place in the function, since "call" in the >> + indirect thunk pushes the return address onto stack, destroying >> + red-zone. */ > > Technically we can use red-zone if we reserve space for the call address, > right? > Perhaps mention it as TODO, that should even not be too hard to implement > but is probably not being that important code quality wise in practice. Will do, >> bool >> ix86_using_red_zone (void) >> { >> - return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI; >> + return (TARGET_RED_ZONE >> + && !TARGET_64BIT_MS_ABI >> + && (!cfun->machine->has_local_indirect_jump >> + || cfun->machine->indirect_branch_type == indirect_branch_keep)); >> } >> >> /* Return a string that documents the current -m options. The caller is >> @@ -5797,6 +5804,37 @@ ix86_set_func_type (tree fndecl) >> } >> } >> >> @@ -10639,6 +10681,191 @@ ix86_setup_frame_addresses (void) >> # endif >> #endif >> >> +static int indirectlabelno; >> +static bool indirect_thunk_needed = false; >> +static bool indirect_thunk_bnd_needed = false; >> + >> +static int indirect_thunks_used; >> +static int indirect_thunks_bnd_used; > > Add comments for variables. Will do. >> + >> +#ifndef INDIRECT_LABEL >> +# define INDIRECT_LABEL "LIND" >> +#endif >> + >> +/* Fills in the label name that should be used for the indirect thunk. */ >> + >> +static void >> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p) >> +{ >> + if (USE_HIDDEN_LINKONCE) >> + { >> + const char *bnd = need_bnd_p ? "_bnd" : ""; >> + if (regno >= 0) >> + { >> + const char *reg_prefix; >> + if (LEGACY_INT_REGNO_P (regno)) >> + reg_prefix = TARGET_64BIT ? "r" : "e"; >> + else >> + reg_prefix = ""; >> + sprintf (name, "__x86_indirect_thunk%s_%s%s", >> + bnd, reg_prefix, reg_names[regno]); >> + } >> + else >> + sprintf (name, "__x86_indirect_thunk%s", bnd); >> + } >> + else >> + { >> + if (regno >= 0) >> + { >> + if (need_bnd_p) >> + ASM_GENERATE_INTERNAL_LABEL (name, "LITBR", regno); >> + else >> + ASM_GENERATE_INTERNAL_LABEL (name, "LITR", regno); >> + } >> + else >> + { >> + if (need_bnd_p) >> + ASM_GENERATE_INTERNAL_LABEL (name, "LITB", 0); >> + else >> + ASM_GENERATE_INTERNAL_LABEL (name, "LIT", 0); >> + } >> + } >> +} >> + >> +/* Output a call and return thunk for indirect branch. If BND_P is >> + true, the BND prefix is needed. If REGNO != -1, the function >> + address is in REGNO. Otherwise, the function address is on the >> + top of stack. */ > > Add here comment which gives example of full thunk body so it is easier > to see what is going on here. Sure. >> + >> +static void >> +output_indirect_thunk (bool need_bnd_p, int regno) >> +{ >> + char indirectlabel1[32]; >> + char indirectlabel2[32]; >> + >> + ASM_GENERATE_INTERNAL_LABEL (indirectlabel1, INDIRECT_LABEL, >> + indirectlabelno++); >> + ASM_GENERATE_INTERNAL_LABEL (indirectlabel2, INDIRECT_LABEL, >> + indirectlabelno++); >> + >> + /* Call */ >> + if (need_bnd_p) >> + fputs ("\tbnd call\t", asm_out_file); >> + else >> + fputs ("\tcall\t", asm_out_file); >> + assemble_name_raw (asm_out_file, indirectlabel2); >> + fputc ('\n', asm_out_file); >> + >> + ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1); >> + >> + /* Pause . */ >> + fprintf (asm_out_file, "\tpause\n"); >> + >> + /* Jump. */ >> + fputs ("\tjmp\t", asm_out_file); >> + assemble_name_raw (asm_out_file, indirectlabel1); >> + fputc ('\n', asm_out_file); >> + >> + ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2); >> + >> + if (regno >= 0) >> + { >> + /* MOV. */ >> + rtx xops[2]; >> + xops[0] = gen_rtx_MEM (word_mode, stack_pointer_rtx); >> + xops[1] = gen_rtx_REG (word_mode, regno); >> + output_asm_insn ("mov\t{%1, %0|%0, %1}", xops); >> + } >> + else >> + { >> + /* LEA. */ >> + rtx xops[2]; >> + xops[0] = stack_pointer_rtx; >> + xops[1] = plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD); >> + output_asm_insn ("lea\t{%E1, %0|%0, %E1}", xops); >> + } >> + >> + if (need_bnd_p) >> + fputs ("\tbnd ret\n", asm_out_file); >> + else >> + fputs ("\tret\n", asm_out_file); >> +} >> + >> +/* Output a funtion with a call and return thunk for indirect branch. >> + If BND_P is true, the BND prefix is needed. If REGNO != -1, the >> + function address is in REGNO. Otherwise, the function address is >> + on the top of stack. */ >> + >> +static void >> +output_indirect_thunk_function (bool need_bnd_p, int regno) >> +{ >> + char name[32]; >> + tree decl; >> + >> + /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd. */ >> + indirect_thunk_name (name, regno, need_bnd_p); >> + decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL, >> + get_identifier (name), >> + build_function_type_list (void_type_node, NULL_TREE)); >> + DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL, >> + NULL_TREE, void_type_node); >> + TREE_PUBLIC (decl) = 1; >> + TREE_STATIC (decl) = 1; >> + DECL_IGNORED_P (decl) = 1; > > DECL_ARTIFICIAL as well? This is done exactly the same way as PIC thunk. I don't think we should change it here. > >> + >> +#if TARGET_MACHO >> + if (TARGET_MACHO) >> + { >> + switch_to_section (darwin_sections[picbase_thunk_section]); >> + fputs ("\t.weak_definition\t", asm_out_file); >> + assemble_name (asm_out_file, name); >> + fputs ("\n\t.private_extern\t", asm_out_file); >> + assemble_name (asm_out_file, name); >> + putc ('\n', asm_out_file); >> + ASM_OUTPUT_LABEL (asm_out_file, name); >> + DECL_WEAK (decl) = 1; >> + } >> + else >> +#endif >> + if (USE_HIDDEN_LINKONCE) >> + { >> + cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME >> (decl)); >> + >> + targetm.asm_out.unique_section (decl, 0); >> + switch_to_section (get_named_section (decl, NULL, 0)); >> + >> + targetm.asm_out.globalize_label (asm_out_file, name); >> + fputs ("\t.hidden\t", asm_out_file); >> + assemble_name (asm_out_file, name); >> + putc ('\n', asm_out_file); >> + ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl); >> + } >> + else >> + { >> + switch_to_section (text_section); >> + ASM_OUTPUT_LABEL (asm_out_file, name); >> + } >> + > > Why do you need to output asm visibility directives by hand when you create > symbol for the function anyway? This is done exactly the same way as PIC thunk. I don't think we should change it here. > I would expect that you can just use similar code as cgraph_node::expand_thunk > when calls output_mi_thunk and get this done in a way that is independent of > target assembler. I took a look. I don't see an easy to do it. I'd like to keep it exactly the same as PIC thunk. And we change both thunks together later if needed. >> >> +/* Output indirect branch via a call and return thunk. CALL_OP is >> + the branch target. XASM is the assembly template for CALL_OP. >> + Branch is a tail call if SIBCALL_P is true. */ > > Again please add example of code sequences output here to make code easier to > follow. Will do. >> + >> +static void >> +ix86_output_indirect_branch (rtx call_op, const char *xasm, >> + bool sibcall_p) >> +{ >> + char thunk_name_buf[32]; >> + char *thunk_name; >> + char push_buf[64]; >> + bool need_bnd_p = ix86_bnd_prefixed_insn_p (current_output_insn); >> + int regno; >> + >> + if (REG_P (call_op)) >> + regno = REGNO (call_op); >> + else >> + regno = -1; >> + >> + if (cfun->machine->indirect_branch_type >> + != indirect_branch_thunk_inline) >> + { >> + if (cfun->machine->indirect_branch_type == indirect_branch_thunk) >> + { >> + if (regno >= 0) >> + { >> + int i = regno; >> + if (i >= FIRST_REX_INT_REG) >> + i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1); >> + if (need_bnd_p) >> + indirect_thunks_bnd_used |= 1 << i; >> + else >> + indirect_thunks_used |= 1 << i; >> + } >> + else >> + { >> + if (need_bnd_p) >> + indirect_thunk_bnd_needed = true; >> + else >> + indirect_thunk_needed = true; >> + } >> + } >> + indirect_thunk_name (thunk_name_buf, regno, need_bnd_p); >> + thunk_name = thunk_name_buf; >> + } >> + else >> + thunk_name = NULL; >> + >> + snprintf (push_buf, sizeof (push_buf), "push{%c}\t%s", >> + TARGET_64BIT ? 'q' : 'l', xasm); >> + >> + if (sibcall_p) >> + { >> + if (regno < 0) >> + output_asm_insn (push_buf, &call_op); >> + if (thunk_name != NULL) >> + { >> + if (need_bnd_p) >> + fprintf (asm_out_file, "\tbnd jmp\t%s\n", thunk_name); >> + else >> + fprintf (asm_out_file, "\tjmp\t%s\n", thunk_name); >> + } >> + else >> + output_indirect_thunk (need_bnd_p, regno); >> + } >> + else >> + { >> + if (regno >= 0 && thunk_name != NULL) >> + { >> + if (need_bnd_p) >> + fprintf (asm_out_file, "\tbnd call\t%s\n", thunk_name); >> + else >> + fprintf (asm_out_file, "\tcall\t%s\n", thunk_name); >> + return; >> + } >> + >> + char indirectlabel1[32]; >> + char indirectlabel2[32]; >> + >> + ASM_GENERATE_INTERNAL_LABEL (indirectlabel1, >> + INDIRECT_LABEL, >> + indirectlabelno++); >> + ASM_GENERATE_INTERNAL_LABEL (indirectlabel2, >> + INDIRECT_LABEL, >> + indirectlabelno++); >> + >> + /* Jump. */ >> + if (need_bnd_p) >> + fputs ("\tbnd jmp\t", asm_out_file); >> + else >> + fputs ("\tjmp\t", asm_out_file); >> + assemble_name_raw (asm_out_file, indirectlabel2); >> + fputc ('\n', asm_out_file); >> + >> + ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1); >> + >> + if (MEM_P (call_op)) >> + { >> + struct ix86_address parts; >> + rtx addr = XEXP (call_op, 0); >> + if (ix86_decompose_address (addr, &parts) >> + && parts.base == stack_pointer_rtx) >> + { >> + /* Since call will adjust stack by -UNITS_PER_WORD, >> + we must convert "disp(stack, index, scale)" to >> + "disp+UNITS_PER_WORD(stack, index, scale)". */ >> + if (parts.index) >> + { >> + addr = gen_rtx_MULT (Pmode, parts.index, >> + GEN_INT (parts.scale)); >> + addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx, >> + addr); >> + } >> + else >> + addr = stack_pointer_rtx; >> + >> + rtx disp; >> + if (parts.disp != NULL_RTX) >> + disp = plus_constant (Pmode, parts.disp, >> + UNITS_PER_WORD); >> + else >> + disp = GEN_INT (UNITS_PER_WORD); >> + >> + addr = gen_rtx_PLUS (Pmode, addr, disp); >> + call_op = gen_rtx_MEM (GET_MODE (call_op), addr); >> + } >> + } >> + >> + if (regno < 0) >> + output_asm_insn (push_buf, &call_op); >> + >> + if (thunk_name != NULL) >> + { >> + if (need_bnd_p) >> + fprintf (asm_out_file, "\tbnd jmp\t%s\n", thunk_name); >> + else >> + fprintf (asm_out_file, "\tjmp\t%s\n", thunk_name); >> + } >> + else >> + output_indirect_thunk (need_bnd_p, regno); >> + >> + ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2); >> + >> + /* Call. */ >> + if (need_bnd_p) >> + fputs ("\tbnd call\t", asm_out_file); >> + else >> + fputs ("\tcall\t", asm_out_file); >> + assemble_name_raw (asm_out_file, indirectlabel1); >> + fputc ('\n', asm_out_file); >> + } >> +} > > It is really not very pretty that the whole sequence is injected into insn > stream > as a single blob. How opaque it is? Does it need to be patched at dynamic > link time? It must be very opaque to optimizers. No, we don't patch at load-time. > I suppose it may make sense to split the insn and at least explicitly > represent > the fact that we (sometimes) push the target to stack. Did you mean "split the function"? I will break it into ix86_output_indirect_branch_via_reg and ix86_output_indirect_branch_via_push. > Why the memory variant exists at first place? When the function address is in memory, we can push it onto stack to save a register. Also it is needed to cover "call foo" to "call [foo@GOT]" conversion. >> (define_insn "*indirect_jump" >> [(set (pc) (match_operand:W 0 "indirect_branch_operand" "rBw"))] >> "" >> - "%!jmp\t%A0" >> + "* return ix86_output_indirect_jmp (operands[0], false);" >> [(set_attr "type" "ibr") >> (set_attr "length_immediate" "0") >> (set_attr "maybe_prefix_bnd" "1")]) > > I think you also want to update type to "many" when we do more than just > indirect branch. Did you mean "multi"? I will change to "multi". > We do not care much about this, but it feels wrong to have attributes off > reality. >> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >> index f3e4a63ab46..ddb6035be96 100644 >> --- a/gcc/doc/extend.texi >> +++ b/gcc/doc/extend.texi >> @@ -5754,6 +5754,16 @@ Specify which floating-point unit to use. You must >> specify the >> @code{target("fpmath=sse+387")} because the comma would separate >> different options. >> >> +@item indirect_branch("@var{choice}") >> +@cindex @code{indirect_branch} function attribute, x86 >> +On x86 targets, the @code{indirect_branch} attribute causes the compiler >> +to convert indirect call and jump with @var{choice}. @samp{keep} >> +keeps indirect call and jump unmodified. @samp{thunk} converts indirect >> +call and jump to call and return thunk. @samp{thunk-inline} converts >> +indirect call and jump to inlined call and return thunk. >> +@samp{thunk-extern} converts indirect call and jump to external call >> +and return thunk provided in a separate object file. > > Please expand the documentation in a way that random user who is not aware of > the > issue will understand that those are security features that come at a cost. > >> +@opindex -mindirect-branch >> +Convert indirect call and jump with @var{choice}. The default is >> +@samp{keep}, which keeps indirect call and jump unmodified. >> +@samp{thunk} converts indirect call and jump to call and return thunk. >> +@samp{thunk-inline} converts indirect call and jump to inlined call >> +and return thunk. @samp{thunk-extern} converts indirect call and jump >> +to external call and return thunk provided in a separate object file. >> +You can control this behavior for a specific function by using the >> +function attribute @code{indirect_branch}. @xref{Function Attributes}. > > Similarly here. I will update documentation with user guide info after Intel white paper is published. > Rest of the patch seems OK. We may want incrementally represent more of the > indirect jump/call seqeuence in RTL, but at this point probably keeping things > simple and localized is not a bad idea. This can be done incrementally. > > Please make updated patch and I would like to give others chance to comment > today. > > Honza -- H.J.