Re: [Tinycc-devel] [PATCH] When handling '.' operator, cope with VT_LLOCAL
Hi, On Fri, 20 Feb 2015, Edmund Grimley Evans wrote: Does this look all right? Do you have a testcase? I'll agree that there's possibly something fishy, but the change doesn't look 100% right. ... looksy looksy ... Hmm, maybe I trace your path how you found this :) When trying to use VT_LLOCAL to replace VT_REF, when the member in question is not an array, right? ;) Because then without your patch this example generates wrong code under win64: struct S {int a; int x[7];}; int f (int a, int b, int c, int d, struct S s) { return s.a; } So, something is wrong indeed. The problem I have with your patch: @@ -4024,6 +4024,8 @@ ST_FUNC void unary(void) vtop-type.t |= qualifiers; /* an array is never an lvalue */ if (!(vtop-type.t VT_ARRAY)) { +if ((vtop-r (VT_VALMASK | VT_LVAL)) == (VT_LOCAL | VT_LVAL)) +vtop-r += VT_LLOCAL - VT_LOCAL; vtop-r |= lvalue_type(vtop-type.t); #ifdef CONFIG_TCC_BCHECK /* if bound checking, the referenced pointer must be checked */ is: this patches only one place where lvalue_type is called (i.e. where something on the stack is lvalueized again). Why are the others trivially correct and don't need such handling? Also, conceptually, I might have a problem with this: it fixes up things after they went wrong. For instance, why isn't the problem in gen_op? I mean, one invariant of gen_op (when called with non-optimizable args) is that it leaves a non-lvalue on the stack, so the code in unary() is correct as is, for those cases. So, wouldn't it be better if the lvalue-to-rvalue conversion would happen also with gen_op(+,0)? Ciao, Michael. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] [PATCH] deprecating VT_REF
Hi, On Sat, 21 Feb 2015, Edmund Grimley Evans wrote: Michael Matz matz@frakked.de: I don't like this. tccgen.c should become _less_ dependend on the TARGET defines, not more. Hence either VT_REF has a purpose and it might make sense to use it in more backends, or it hasn't and should be removed also from x86_64. I agree, but inserting those #ifdefs is merely making the architecture-specificity explicit, thereby perhaps encouraging someone to do something about it. Without the #ifdefs, but with VT_REF only used by the x86_64 back end, you have hidden target-specificity, which is more dangerous. I disagree about dangerous; it's only a problem for those targets, and presumably for them the generic handling of VT_REF is correct. If you mean dangerous as in others might be tempted to use it in their target, even though the generic handling isn't correct for them, then, well, life is tough; it's not enough reason IMHO to uglify common code. Neither is trying to make others interested in removing the thing. But apart from that philosophical argument, I claim that the concept of VT_REF is not target specific at all. It does have things in common with VT_LLOCAL, but is not _quite_ the same currently, as we now found out in the other mails. This not-quite-equality might indeed be bugs in how VT_LLOCAL is handled, e.g. VT_LLOCAL might be understood as an optimization to VT_REF (even though the latter was introduced later), the optimization being that lvalueness/referenceness can be changed by a simple bitflip, instead of generating code. So, yes, ideally VT_LLOCAL and VT_REF should be merged. Though the question still is into which one. VT_LLOCAL is tricky, as the docu said :) And it currently contains bugs, when used for some things. _If_ the issues with VT_LLOCAL can be fixed, then I'd be in favor of removing VT_REF. After such fixing goes in. AFAIU VT_REF is added when an argument is passed in a caller allocated buffer whose address itself is passed in a register (i.e. by-reference-passing). Do you mean on the stack? Usually, but not necessarily. For a function like int f (int a; struct large b) { ... } there are two ways to get to 'b': 1) it's saved on the stack by caller, the address will be right below the slot for 'a', or the first slot, if 'a' gets a register. So addressof(b) will be something like sp+offset. 2) it's stored in some buffer (not necessarily stack) and the address of it is explicitely passed in the argument that would take the place of 'b' were it of pointer type. Now addressof(b) is not necessarily some offset from sp, but rather must be loaded (either from a register, or the stack slot that addressof value got) Yes, VT_LLOCAL can nearly express the concept 2. VT_REF as well (just that it works right now). On arm64 the caller sometimes allocates a buffer then passes a pointer to that buffer on the stack, and I'm handling that with VT_LLOCAL, Okay, that's the same as win64 then, which means you could have also used VT_REF. (I'm not saying you should have, just stating the fact.) which is already widely used. (I checked that a lot of VT_LLOCALs are generated when you run the test suite on i386, for example.) Yes, VT_LLOCAL is used often. But it's used for only one thing currently (i.e. the number of concepts expressed by it is not very large): when an lvalue contained in a register is spilled, it becomes a stack slot with VT_LLOCAL. That makes sense, because lvalue in a register means the register contains the address of that lvalue (and then so does the stack slot). But for instance it's not totally clear what VT_LLOCAL means on something that never was a register (e.g. because it is not of right type). Also, is it ever okay to have VT_LLOCAL without VT_LVAL? (There should be nothing wrong about this, but does the code always handle this?) I think perhaps it's time to try to figure out what the three flags (VT_LLOCAL, VT_LVAL and VT_LOCAL) really mean, which combinations make sense (under which circumstances) and which operations do what to the flags. If we had clear rules we probably can make VT_LLOCAL do exactly what's necessary in generic code, to make it safely and generally usable. Right now the whole VT_LLOCAL handling seems quite ad-hoc (and that results exactly in the omissions you saw with the '.' operator). I'd definitely prefer it if someone could remove VT_REF altogether. Alternatively, someone could patch the documentation to explain exactly what VT_REF means, why it's needed, and how it differs from VT_LLOCAL. If neither of those patches is forthcoming I'd prefer something like my patch to be committed to just leaving it as it is. Well, we're still discussing, so not all is lost yet :) I'd actually hope to have such description for VT_LLOCAL in the end, and remove VT_REF after the current threads are finished. Ciao, Michael.
Re: [Tinycc-devel] [PATCH] When handling '.' operator, cope with VT_LLOCAL
Hi, On Sun, 22 Feb 2015, Edmund Grimley Evans wrote: The only disadvantage that I can think of is that certain expressions which would cause GCC to warn that value computed is not used would not get optimised. I think the conversion to rvalue might not actually have to emit code, in which case unused stuff would be as optimized as before (as far as tcc is optimizing :)). I can't see how to do that. Hmm, perhaps it's not possible. I thought via manipulating the VT_foo flags in vtop-r. Since I probably still have a test case to hand, should I propose an alternative patch that fixes my problem by modifying gen_op? Yeah, I think so. So here's an alternative patch that fixes the problem for me: diff --git a/tccgen.c b/tccgen.c index ae07563..9dfeb2b 100644 --- a/tccgen.c +++ b/tccgen.c @@ -1827,6 +1827,8 @@ ST_FUNC void gen_op(int op) vtop-type.t = t; } } +if (vtop-r VT_LVAL) +gv(is_float(vtop-type.t VT_BTYPE) ? RC_FLOAT : RC_INT); } #ifndef TCC_TARGET_ARM It probably doesn't change the generated code significantly, though I should perhaps warn that apart from the cases I alluded to earlier, where a value computed is not used, there may also be a few cases where converting the value to an rvalue earlier than it is needed hurts the register allocation, as in something like f(a + 0, b + 0, c + 0, d + 0, e + 0, f + 0). I wouldn't worry about it, though. I've just noticed that the patch above doesn't have the comment I wrote, something like: +// Make sure that we have converted to an rvalue: Do you like this new patch? I need to fix this problem somehow for arm64. Now that I also have a working aarch64 tcc, do you have a testcase for the problem? The testsuite itself is clean. Ciao, Michael. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] [PATCH] deprecating VT_REF
Hi, On Fri, 20 Feb 2015, Edmund Grimley Evans wrote: VT_REF is not mentioned in the documentation and seems to be used only for x86_64. Also, it seems to be the same thing as VT_LLOCAL, really. This could be a first step towards removing it altogether. Commit message: Make it explicit that VT_REF is used only for TCC_TARGET_X86_64. tcc.h: Make the definition conditional on TCC_TARGET_X86_64. tccgen.c: Since the VT_REF bit is set only in x86_64-gen.c we can make clearing it and testing for it conditional. I don't like this. tccgen.c should become _less_ dependend on the TARGET defines, not more. Hence either VT_REF has a purpose and it might make sense to use it in more backends, or it hasn't and should be removed also from x86_64. And _if_ it has a purpose only on x86-64 (which indeed seems unlikely) then the common code should still handle it unconditionally. FWIW it's currently only used when TCC_TARGET_PE, i.e. win64. It was introduced by grischka in 2010 for some win64 va_arg handling of structure types (8d107d9ff) and fixed a bit later with f115c123. AFAIU VT_REF is added when an argument is passed in a caller allocated buffer whose address itself is passed in a register (i.e. by-reference-passing). So the register contains not the argument itself, but the address of it. That's not a too unusual ABI (x86-64 linux, though, uses passing on stack, not by reference), as it can sometimes avoid a memcpy when calling such functions. And yes, that seems to be VT_LLOCAL. grischka, can you try the below patch on win64? If it works we can remove VT_REF altogether. I verified it generates the same code with a cross-to-win64 tcc on functions where it should be active, like: struct S {int x[7];}; int f ( #ifndef IN_REGS int a, int b, int c, int d, #endif struct S s) { return s.x[3]; } (with -DIN_REGS the address is passed in rcx, without it's passed on stack). Ciao, Michael. -- diff --git a/x86_64-gen.c b/x86_64-gen.c index c8fed85..463cf8d 100644 --- a/x86_64-gen.c +++ b/x86_64-gen.c @@ -855,7 +855,7 @@ void gfunc_prolog(CType *func_type) if (reg_param_index REGN) { gen_modrm64(0x89, arg_regs[reg_param_index], VT_LOCAL, NULL, addr); } -sym_push(sym-v ~SYM_FIELD, type, VT_LOCAL | VT_LVAL | VT_REF, addr); +sym_push(sym-v ~SYM_FIELD, type, VT_LVAL | VT_LLOCAL, addr); } else { if (reg_param_index REGN) { /* save arguments passed by register */ ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] [PATCH] When handling '.' operator, cope with VT_LLOCAL
Michael Matz matz@frakked.de: Do you have a testcase? I made the change after something in abitest failed with the arm64 back end, but with a newer version than the one I sent to the list. is: this patches only one place where lvalue_type is called (i.e. where something on the stack is lvalueized again). The function lvalue_type isn't just used to lvalueise things, and there may be places where the stack is lvaluised without calling lvalue_type ... Why are the others trivially correct and don't need such handling? I expect there are many bits of code that are wrong but which happen never to receive the values that they can't handle. How many parts of tccgen.c can correctly handle a VT_REF, do you think? :-) Also, conceptually, I might have a problem with this: it fixes up things after they went wrong. For instance, why isn't the problem in gen_op? I mean, one invariant of gen_op (when called with non-optimizable args) is that it leaves a non-lvalue on the stack, so the code in unary() is correct as is, for those cases. So, wouldn't it be better if the lvalue-to-rvalue conversion would happen also with gen_op(+,0)? The advantage of making gen_op always convert to an rvalue is that other code would have to worry about fewer cases. The only disadvantage that I can think of is that certain expressions which would cause GCC to warn that value computed is not used would not get optimised. So, if you're suggesting that gen_op should be modified to always convert to an rvalue, then I think I agree. Shall we do that instead? Since I probably still have a test case to hand, should I propose an alternative patch that fixes my problem by modifying gen_op? Edmund ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] [PATCH] deprecating VT_REF
Michael Matz matz@frakked.de: I don't like this. tccgen.c should become _less_ dependend on the TARGET defines, not more. Hence either VT_REF has a purpose and it might make sense to use it in more backends, or it hasn't and should be removed also from x86_64. I agree, but inserting those #ifdefs is merely making the architecture-specificity explicit, thereby perhaps encouraging someone to do something about it. Without the #ifdefs, but with VT_REF only used by the x86_64 back end, you have hidden target-specificity, which is more dangerous. AFAIU VT_REF is added when an argument is passed in a caller allocated buffer whose address itself is passed in a register (i.e. by-reference-passing). Do you mean on the stack? On arm64 the caller sometimes allocates a buffer then passes a pointer to that buffer on the stack, and I'm handling that with VT_LLOCAL, which is already widely used. (I checked that a lot of VT_LLOCALs are generated when you run the test suite on i386, for example.) I'd definitely prefer it if someone could remove VT_REF altogether. Alternatively, someone could patch the documentation to explain exactly what VT_REF means, why it's needed, and how it differs from VT_LLOCAL. If neither of those patches is forthcoming I'd prefer something like my patch to be committed to just leaving it as it is. Edmund ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] [PATCH] TCC arm64 back end
Hi, On Thu, 19 Feb 2015, Edmund Grimley Evans wrote: It's not finished, but a lot of things seem to work, and there's a problem with the linker that perhaps someone could help me with. See README.arm64 for details. That's quite cool. Below is a patch on top of your's (slightly amended to resolve the conflicts with top-of-mob of your recent commits) which fixes -run. Obviously quite more stuff is missing, most of which will be necessary for generating real shared libs (or even psABI compliant exes). And the codegen should sometimes use different strategies. One thing I noticed is for instance that gen_addr uses non-GOT relocs for accessing the address of a symbol. That's generally not correct, at least for weak symbols one has to go via a GOT entry (or some in-.text slot with a ABS64 reloc). I've hacked around this in the patch, but eventually such things need cleanup. And I pulled my hair out again when tracing the different paths the linker can go through in different modes, and how the relocs and symbol values change over the course of compilation. One of those days ... :-) Ciao, Michael. - From c2274760caed058d53a94e161c34ea7ea49bfdca Mon Sep 17 00:00:00 2001 From: Michael Matz m...@suse.de Date: Sun, 22 Feb 2015 05:59:06 +0100 Subject: [PATCH] aarch64: Fix -run. This adds some more support for properly transfering some offsets over the different stages of a relocations life. Still not at all psABI compliant and DSOs can't yet be generated. But it runs the testsuite in qemu-arm64. --- tccelf.c | 62 +--- tccrun.c |2 +- tests/Makefile|2 +- tests/tests2/Makefile | 20 --- 4 files changed, 55 insertions(+), 31 deletions(-) diff --git a/tccelf.c b/tccelf.c index 4f89224..2b07f60 100644 --- a/tccelf.c +++ b/tccelf.c @@ -451,7 +451,7 @@ ST_FUNC void relocate_syms(TCCState *s1, int do_resolve) if (addr) { sym-st_value = (addr_t)addr; #ifdef DEBUG_RELOC - printf (relocate_sym: %s - 0x%x\n, name, sym-st_value); + printf (relocate_sym: %s - 0x%lx\n, name, sym-st_value); #endif goto found; } @@ -797,8 +797,21 @@ ST_FUNC void relocate_section(TCCState *s1, Section *s) break; case R_AARCH64_JUMP26: case R_AARCH64_CALL26: + /* This check must match the one in build_got_entries, testing + if we really need a PLT slot. */ + if (sym-st_shndx == SHN_UNDEF) + /* We've put the PLT slot offset into r_addend when generating + it, and that's what we must use as relocation value (adjusted + by section offset of course). */ + val = s1-plt-sh_addr + rel-r_addend; +#ifdef DEBUG_RELOC + printf (reloc %d @ 0x%lx: val=0x%lx name=%s\n, type, addr, val, + (char *) symtab_section-link-data + sym-st_name); +#endif if (((val - addr) + ((uint64_t)1 27)) ~(uint64_t)0xffc) -tcc_error(R_AARCH64_(JUMP|CALL)26 relocation failed); + { +tcc_error(R_AARCH64_(JUMP|CALL)26 relocation failed (val=%lx, addr=%lx), addr, val); + } *(uint32_t *)ptr = 0x1400 | (type == R_AARCH64_CALL26) 31 | ((val - addr) 2 0x3ff); break; @@ -818,7 +831,17 @@ ST_FUNC void relocate_section(TCCState *s1, Section *s) 0xff8) 7; break; case R_AARCH64_COPY: - break; +break; +case R_AARCH64_GLOB_DAT: +case R_AARCH64_JUMP_SLOT: +/* They don't need addend */ +#ifdef DEBUG_RELOC + printf (reloc %d @ 0x%lx: val=0x%lx name=%s\n, type, addr, + val - rel-r_addend, + (char *) symtab_section-link-data + sym-st_name); +#endif +*(addr_t *)ptr = val - rel-r_addend; +break; default: fprintf(stderr, FIXME: handle reloc type %x at %x [%p] to %x\n, type, (unsigned)addr, ptr, (unsigned)val); @@ -1211,6 +1234,7 @@ static unsigned long put_got_entry(TCCState *s1, plt = s1-plt; if (plt-data_offset == 0) section_ptr_add(plt, 32); +symattr-plt_offset = plt-data_offset; p = section_ptr_add(plt, 16); put32(p, s1-got-data_offset); put32(p + 4, (uint64_t)s1-got-data_offset 32); @@ -1372,6 +1396,23 @@ ST_FUNC void build_got_entries(TCCState *s1) put_got_entry(s1, reloc_type, sym-st_size, sym-st_info, sym_index); break; + + case R_AARCH64_JUMP26: + case R_AARCH64_CALL26: +if (!s1-got) +build_got(s1); +sym_index =
Re: [Tinycc-devel] [PATCH] use RELA relocations properly for R_DATA_PTR
Hello Edmund, On Tue, 17 Feb 2015, Edmund Grimley Evans wrote: With TCC the addend is zero! Apparently TCC is putting the offset is in the data section: readelf -x .data t1.gcc.o 0x readelf -x .data t1.tcc.o 0x 1000 Now this isn't how RELA relocations are normally described as working, though I'm not sure that it's wrong. It's strictly wrong, but harmless, because, as you noticed, all linkers OR in the value at the reloc place. It's one of those things that always wants fixing but never really reach the top of TODO lists :) As it reached yours, all the better. I haven't found a document that says you have to ignore what value is in the place with a RELA relocation, Actually, the x86-64 psABI is explicit. The formulas for calculating reloc values only contain one 'A' (addend) and specify that the addend is in r_addend of the reloc, so the addend at the place must be ignored. But alas, real world is more forgiving :) Proposed commit message: Use RELA relocations properly for R_DATA_PTR on x86_64. libtcc.c: Add greloca, a generalisation of greloc that takes an addend. tcc.h: Add greloca and put_elf_reloca. tccelf.c: Add put_elf_reloca, a generalisation of put_elf_reloc. tccgen.c: On x86_64, use greloca instead of greloc in init_putv. The patch is fine, please commit. I have one remark: -if (vtop-r VT_SYM) { +#ifdef TCC_TARGET_X86_64 +if (vtop-r VT_SYM) +greloca(sec, vtop-sym, c, R_DATA_PTR, vtop-c.ptr_offset); +else +*(addr_t *)ptr |= (vtop-c.ptr_offset bit_mask) bit_pos; +#else +if (vtop-r VT_SYM) greloc(sec, vtop-sym, c, R_DATA_PTR); -} *(addr_t *)ptr |= (vtop-c.ptr_offset bit_mask) bit_pos; +#endif Strictly speaking the addend should be (vtop-c.ptr_offset bit_mask) bit_pos; This should make no difference for VT_SYM vtops, but perhaps an assert would be better (bit_pos == 0). OTOH this routine has other checking problems (e.g. it doesn't give an error if the value doesn't fit e.g. a VT_BYTE place), so adding just one assert for this case seems inconsistent. Oh, and one other thing: if you could include (smallish) patches inline at the end of mail, it makes quoting parts of them for review easier; of course only if your mailer doesn't mangle patches added inline. Ciao, Michael. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Some tcc questions
Hi, On Thu, 19 Feb 2015, Naper Hamza wrote: Hello , I wanna know if tcc have a global config , where we can change some stuff , if not why not implementing one ? That's not how compilers traditionally work, it would be a layering violation. What options are active for a compilation is usually determined by the build system using the compiler (and given to the compiler as command line options or environment variables). Having a config file would be very surprising to most compiler users. What stuff to change globally do you have in mind? Also I wanna know if a official tcc organization is available on GitHub , No. which means also if a official tcc repo is available too . One doesn't imply the other. The official tcc repo is at {http,git}://repo.or.cz/tinycc.git Ciao, Michael.___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel
Re: [Tinycc-devel] Fallout from commit 01c041923474750a236da02561f0f8835
Hi, On Fri, 20 Feb 2015, Edmund Grimley Evans wrote: extern void f(void); g() { void (*ptr)(void) = f; ptr(); } Here there will be a GOT slot allocated for 'f'. The initialization of 'ptr' will load from that GOT slot. So even though direct calls to 'f' can (and will be, when 'f' is defined) fully resolved without a PLT slot, If f is in a library it may turn out not to be in range for a direct call. Right. But defined in tcc sense means defined in the .o/.c files taking part in the compilation, not in some shared library. I.e. symbols that don't have SHN_UNDEF as st_shndx. They are always reachable (as long as the programs text segment doesn't become too large, i.e. the usual small model when sizeof(.text) must be something 1(some-bitsize)). Probably I'd have to use that exact debian-based setup under qemu (my chroot is based on some openSUSE version), but I don't know a simple recipe for how. Any help appreciated. You could try taking debootstrap from the Debian archive, unpacking it manually, and running it as root thus: .../debootstrap --arch armhf unstable .../chroots/arm64-unstable http://ftp.debian.org/debian I've done that before for creating a Debian chroot on a non-Debian Linux machine. It's also what I use for creating a chroot for a different architecture, such as arm64 on i386 (with QEMU), or armhf on arm64 (without QEMU). Yeah, something wasn't quite right with my qemu when I tried this last time in September. Now it works. Ciao, Michael. ___ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel