On 02/02/16 19:33, Ulrich Weigand wrote:
Marcin Kościelnicki wrote:

Here we go.  I've also removed the "see below", since I don't really
see anything below...

The "see below" refers to this code (which I agree isn't really obvious):

   if (TARGET_TPF_PROFILING)
     {
       /* Generate a BAS instruction to serve as a function
          entry intercept to facilitate the use of tracing
          algorithms located at the branch target.  */
       emit_insn (gen_prologue_tpf ());

What is not explicitly called out here is that this tracing function
actually refers to some hard registers, in particular r14, and assumes
they still have the original contents as at function entry.

That is why the prolog code avoid using r14 as temporary if the TPF
tracing mechanism is in use.  Now I think this doesn't apply to r12,
so this part of your patch should still be fine.  (In addition, TPF
is not going to support split stacks --or indeed the Go language--
anyway, so it doesn't really matter all that much.)

Very well, I'll improve the comment.


I do have two other issues; sorry for bringing those up again although
they've been discussed up in the past, but I still think we can find
some improvements here ...

The first is the question Andreas brought up, why we need the extra
set of insns introduced by s390_reorg.  I think this may really have
been necessary for the ESA case where data elements had to be intermixed
into code at a specific location.  But since we no longer support ESA,
we now just have a data block that can be placed anywhere.  For example,
we could just have an insn (at any point in the prolog stream) that
simply emits the full data block during final output, along the lines of
(note: needs to be updated for SImode vs. DImode.):

(define_insn "split_stack_data"
   [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
                      (match_operand 1 "bras_sym_operand" "X")
                      (match_operand 2 "consttable_operand" "X")
                      (match_operand 3 "consttable_operand" "X")]
                     UNSPECV_SPLIT_STACK_DATA)]
   ""
{
   switch_to_section (targetm.asm_out.function_rodata_section
                       (current_function_decl));

   output_asm_insn (\".align 3", operands);
   (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
                                      CODE_LABEL_NUMBER (operands[0]));
   output_asm_insn (\".quad %2\", operands);
   output_asm_insn (\".quad %3\", operands);
   output_asm_insn (\".quad %1-%0\", operands);

   switch_to_section (current_function_section ());
   return "";
}
   [(set_attr "length" "0")])

Or possibly even cleaner, we can simply define the data block at the
tree level as if it were an initialized global variable of a certain
struct type, and just leave it to common code to emit it as usual.

Then we just have the code bits, but I don't really see much
difference between the split_stack_call and split_stack_sibcall
patterns (apart from the data block), so if code flow is OK with
the former insns, it should be OK with the latter too ..

[ Or else, if there *are* code flow issues, the other alternative
would be to emit the full call sequence, code and data, from a
single insn pattern during final output.  This might have the extra
benefit that the assembler sequence is fully fixed, and thus easier
to detect in the linker.  ]

Getting rid of the extra transformation in s390_reorg would not
just remove a bunch of code from the back-end (always good!),
it would also speed up compile time a bit.

When I wasn't using reorg, I had problems with gcc deleting the label in .rodata, since it wasn't used by any jump instruction. I guess having a whole-block instruction that emits the label on its own should solve the issue, though - let's try that.


The second issue I'm still not sure about is the magic nop marker
for frameless functions.  In an earlier mail you wrote:

Both currently supported
architectures always emit split-stack code on every function.

At least for rs6000 this doesn't appear to be true; in
rs6000_expand_split_stack_prologue we have:

   if (!info->push_p)
     return;

so it does nothing for frameless routines.

Now on i386 we do indeed generate code for frameless routines;
in fact, the *same* full stack check is generated as for any
other routine.  Now I'm wondering: is there are reason why
this check would be necessary (and there's simply a bug in
the rs6000 implementation)?  Then we obviously should do the
same on s390.

Try that on powerpc64(le):

$ cat a.c
#include <stdio.h>

void f(void) {
}

typedef void (*fptr)(void);

fptr g(void);

int main() {
        printf("%p\n", g());
}

$ cat b.c
void f(void);

typedef void (*fptr)(void);

fptr g(void) {
        return f;
}

$ gcc -O3 -fsplit-stack -c b.c
$ gcc -O3 -c a.c
$ gcc a.o b.o -fuse-ld=gold

I don't have a recent enough gcc for powerpc, but from what I've seen in the code, this should explode with a linker error.

Of course, mixing split-stack and non-split-stack code when function pointers are involved is sketchy anyway, so what's one more bug...

That said, for s390, we can avoid the above problem by checking the relocation in gold now that ESA paths are gone - for direct function calls (the only ones we care about), we should be seeing a relocation in brasl. So I'll remove the nopmark thing and add proper recognition in gold.


On the other hand, if rs6000 works fine *without* any code
in frameless routines, why wouldn't that work for s390 too?

Emitting a nop (that is always executed) still looks weird to me.


Bye,
Ulrich


Reply via email to