On 14 May 2013 13:01, Claudio Fontana <claudio.font...@huawei.com> wrote: > On 13.05.2013 20:28, Peter Maydell wrote: >> On 13 May 2013 14:33, Claudio Fontana <claudio.font...@huawei.com> wrote: >>> >>> add preliminary support for TCG target aarch64. >> >> Thanks for this patch. Some comments below. >> >>> Signed-off-by: Claudio Fontana <claudio.font...@huawei.com> >>> --- >>> include/exec/exec-all.h | 5 +- >>> tcg/aarch64/tcg-target.c | 1084 >>> ++++++++++++++++++++++++++++++++++++++++++++++
Incidentally a 1000 line patch is pretty tedious to review (even if most of it is ok then you have to go back over the whole thing again when some small part changes), so it might be worth splitting it up a little if there's a reasonable split possible. >> This list seems to not have all the registers in it. >> You can put the registers used for AREG0 and the temp >> reg in here -- TCG will correctly not use them because >> (a) AREG0 is allocated as a fixed register and (b) >> the temp is put in the reserved-regs list in tcg_target_init. >> >> It should be OK to use X16 and X17 as well, right? > > I see, I can add AREG0 (X19) and temp (X8) to the list then. > > I got cold feet about using X16 and X17 when I > experienced their use by possibly libgthread and other > system libraries, and due to their definitions as IP0 > and IP1 ("can be used by call veneers and PLT code"). If they can be used as call veneers then it must be safe to use them inside a function (as callee saves registers) because our caller can't be expecting them to be preserved. > But if you are sure they are safe to use I can add > them to the set as temporary registers. > I skipped X18 because of its definition as the > "platform register". > If you think that's a groundless fear, I can add that > to the list as well. I guess for cross-OS portability we should avoid it (and so you should add it to the reserved_regs set in tcg_target_init()). I don't think it matters whether it appears in this array or not if it's reserved. tcg/arm puts SP in the reg-alloc-order array, for example, even though if we ever used it we'd die horribly. >>> +#ifdef CONFIG_SOFTMMU >>> + mem_index = args[2]; >>> + s_bits = opc & 3; >>> + >>> + /* Should generate something like the following: >>> + * shr x8, addr_reg, #TARGET_PAGE_BITS >>> + * and x0, x8, #(CPU_TLB_SIZE - 1) @ Assumption: CPU_TLB_BITS <= 8 >>> + * add x0, env, x0 lsl #CPU_TLB_ENTRY_BITS >>> + */ >> >> The comment says this, but you don't actually seem to have >> the code to do it? >> >> And there definitely needs to be a test somewhere in >> your generated code for "did the TLB hit or miss?" > > Yes, it's basically a TODO comment, adapted from the ARM TCG target. > Right now we always go through the C helpers, which is of course slower. In the ARM TCG target case, the comment is describing what the following code actually generates (and the bit which isn't implemented is marked as "not implemented yet"). In your case the comment is neither describing what you actually emit nor what you ought in an ideal world to emit. >>> + case INDEX_op_rotl_i64: ext = 1; >>> + case INDEX_op_rotl_i32: /* same as rotate right by (32 - m) */ >>> + if (const_args[2]) /* ROR / EXTR Wd, Wm, Wm, 32 - m */ >>> + tcg_out_rotl(s, ext, args[0], args[1], args[2]); >>> + else { /* no RSB in aarch64 unfortunately. */ >>> + /* XXX UNTESTED */ >>> + tcg_out_movi32(s, ext, TCG_REG_X8, ext ? 64 : 32); >>> + tcg_out_arith(s, ARITH_SUB, ext, TCG_REG_X8, TCG_REG_X8, >>> args[2]); >>> + tcg_out_shiftrot_reg(s, SRR_ROR, ext, args[0], args[1], >>> TCG_REG_X8); >> >> I think you should either test this, or remove it [rot >> support is optional so you could put it back in a later >> patch]. > > Yes, I agree. I could not find an image which triggered that > code path for register rotation amounts. Try PPC : rlwmn will generate a rotl (as will other insns). >> tcg_set_frame() should be called in the prologue generation >> function, not here. Also, please don't use temp_buf, it is >> going to go away shortly, as per this patch: >> http://lists.gnu.org/archive/html/qemu-devel/2013-04/msg03859.html >> > > I understand that you want to get rid of temp_buf; > that would mean if I understand correctly using the stack > for that end; I feel a bit uneasy about the mechanics though, > and the stability of the result: > could we add this to the TODO list for a successive change, > so that we can have a working version now and get a successive > version to a whole new round of testing? Sorry, no. At the moment no in-tree TCG target uses temp_buf, and it's very likely that a patch to remove it will land before your TCG target is added, at which point your code will no longer compile. (This is also part of a general principle where we don't let progressive code cleanups go 'backwards' by admitting new code which still uses the old deprecated mechanisms.) >>> +{ >>> +#if QEMU_GNUC_PREREQ(4, 1) >>> + __builtin___clear_cache((char *)start, (char *)stop); >>> +#else >>> + /* XXX should provide alternative with IC <ic_op>, Xt */ >>> +#error "need GNUC >= 4.1, alternative not implemented yet." >>> +#endif >> >> I think we can just assume a GCC new enough to support >> __builtin___clear_cache(). Nobody's going to be compiling >> aarch64 code with a gcc that old, because they didn't >> support the architecture at all. You can drop the #if/#else >> completely. > > Ok. Should we not support non-GCC compilers though? We do (clang, for instance) -- they just have to support the same set of builtins as gcc. > It should not be terrible to emit the IC instruction, I really don't like handwritten assembly if it can possibly be avoided -- it's a maintenance pain. This is exactly what the gcc builtins are for, and we use them fairly extensively. The 32 bit ARM code only has the hand-written version because it predates gcc versions which support this particular builtin. thanks -- PMM