Re: [Qemu-devel] hw/nand hw/onenand and read-only
On 18.10.11 16:38 , ext Markus Armbruster arm...@redhat.com wrote: $ qemu-system-arm -drive if=none,file=tmp.qcow2,readonly,id=foo -device nand,drive=foo -kernel /dev/null qemu: hardware error: nand_device_init: Unsupported NAND block size. [...] Note that I didn't bother supplying a drive with sensible size. If I did, I guess I'd get nand coupled to a read-only drive. Easy enough for you to try :) Indeed, thanks ;) While I'm at it, I'll also fix the NAND init function to return -1 instead of aborting. I'll send patches for hw/nand and hw/onenand shortly. I'll simply make them reject read-only drives however both device models already support running without a drive as well by using a memory buffer instead so it would also be possible to make them use a read-only drive in a way that initial NAND/OneNAND contents would be read from the drive but any changes would not be written back to the drive and would be lost when QEMU is killed. Cheers, Juha
Re: [Qemu-devel] hw/nand hw/onenand and read-only
On 18.10.11 11:23 , ext Markus Armbruster arm...@redhat.com wrote: These guys have been converted to qdev relatively recently. I wonder what happens when they use a drive defined with -drive if=none,readonly. If that's not a valid configuration, we better reject read-only drives, like ide_init_drive() does. I'm not an expert with QEMU command line options; how do you pass such a drive to a NAND device? With if=mtd I get the following which I guess is expected result: $ qemu-system-arm -M beagle -drive if=mtd,file=nand.img,readonly qemu: readonly flag not supported for drive with this interface Regards, Juha
Re: [Qemu-devel] [PATCH 0/3] usb-musb: make qdev-aware
How to you test musb? Unfortunately I don't have any test cases which actively use the musb, so I settle for testing an n810 image (and a beagle image in my omap3 tree) and confirming that the init part of things still works ok. (I'm not entirely happy with this but init is really all we're changing with these patches so we should be ok...) Riku/Juha -- do you have any musb test images/command lines? With an existing n810 image I guess you can test USB networking by adding -usb -net user,vlan=0 -net nic,model=usb,vlan=0 -usbdevice net -redir tcp:2022::22 to qemu command line parameters. In the guest, launch X terminal and command sudo gainroot followed by udhcpc. You should now be able to ssh to the guest from the host with ssh -p 2022 root@localhost. Would this be sufficient for your needs? Cheers, Juha
Re: [Qemu-devel] OSX build issues
On 17.03.11 13:09 , ext Tristan Gingold ging...@adacore.com wrote: On Mar 17, 2011, at 11:28 AM, François Revol wrote: From the content of the functions called it's either one of the added fds which cause problem on select() (but why ?), or likely some signal masks which interfere with some internal thread in the process. I sampled the process and took some screenshots without and with the #if 0 hack: http://revolf.free.fr/osx/shots/shot_qemu_init_main_loop_if0.png Things work but oddly what is supposed to be an internal dispatcher thread ends up executing qemu code. The main thread includes lots of calls from arbitrary addresses indicating it receives some signals. http://revolf.free.fr/osx/shots/shot_qemu_init_main_loop_if1.png Things stale, and there are 2 more threads that wait, and the main thread seems quite stuck in select(). Yes, I have to investigate that... From what I've found out so far is that the sigwait_compat() function in compatfd.c will block all signals with sigprocmask() call and then wait for a single signal in sigwaitinfo() (forgot what it is). Sounds a bit odd to me but perhaps there's a reason for that? I tried changing the sigprocmask() call so that it only blocks the same signals that it will later wait for and things start rolling again. Regards, Juha
[Qemu-devel] Re: OMAP3 bug: disassembler disagreed with translator
On Feb 18, 2011, at 14:08, ext Антон Кочков wrote: Here also modified omap3_boot.c file. I'm dont know why it is successfully load bootrom at 0x40014000, but cant do this for 0x14000 - it always fill by zeroes. I'm only need copy memory from already loaded rom image from 0x40014000 to 0x00014000 and nothing more. Our OMAP3 boot emulation is a very simplified model. For example, the boot ROM is never mapped to Q0 region, only Q1 region -- the boot ROM execution is started at 0x40014000 rather than 0x00014000. There is no memory mapped at Q0 unless you attach something to the GPMC before reset. Regards, Juha
Re: Về: [Qemu-devel] How to fix NEON of cor text-a9 on qemu
On Sep 22, 2010, at 19:56, ext Pham Van Thiet wrote: I try to find them on meego.gitorious.org/qemu-maemo but I didn't find them. Can you send the links to me? Just clone the git repository and build qemu. Since it is more a development branch than a staging branch against upstream all relevant changes are not ordered in clean, separate patches that you could apply over qemu upstream. Regards, Juha
Re: [Qemu-devel] How to fix NEON of cortext-a9 on qemu
On Sep 21, 2010, at 19:22, ext Pham Van Thiet wrote: example for it please send it for me. Or if you have a patch file for NEON instruction on qemu please send it for me? You could have a try with the MeeGo QEMU branch (meego.gitorious.org/qemu-maemo). We have a bunch of ARM patches on top of the upstream version, some of which are NEON related. Regards, Juha
Re: [Qemu-devel] Can not find the function body of omap3_lcd_panel_draw_fn_32
On Apr 23, 2010, at 15:17, ext steven_c...@htc.commailto:steven_c...@htc.com wrote: I am tracing the code and I can not find the body of omap3_lcd_panel_draw_fn_32. I did print out the address and search the map file, that address did not match any function. Can some one give me a hint?? That is because it is not a function but an array of function pointers. The array is defined in omap3_lcd_panel_template.h but you will not find it in mainline QEMU which I guess you are not using anyway since you are looking for this symbol. Regards, Juha
Re: [Qemu-devel] QEMU state of ARM NEON support.
On Mar 25, 2010, at 12:27, ext Dmitry Zhurikhin wrote: juha.riihim...@nokia.com wrote: - vqdmlsl yields different results for some specific input values, however I think QEMU is producing correct results. For example, 1431655765-(2*-32768*-32768)=-715827883 but your reference hardware results say it should be -715827882. From what I understand hardware gets such result because saturation occurs on the (2*-32768*-32768) step. The result is not 0x8000 as expected but because of overflow is set as the largest 32-bit signed integer 0x7FFF. It seems that QEMU just doesn't perform saturation (in this particular case at least). Yes I see, I overlooked the fact that -32768*-32768 changes the sign and so doubling the result of that overflows the signed 32-bit integer even if the result still fits in 32 bits. Thanks, I'll fix that as well. Regards, Juha
Re: [Qemu-devel] QEMU support for ARM security features / TrustZone
On Jan 22, 2010, at 14:01, ext leic...@informatik.uni-frankfurt.de wrote: I found that QEMU emulates ARM Cortex-8 which in hardware have Trustzone support. It is unclear to me if this is the case in the version that QEMU emulates. ARM TrustZone features are currently not emulated. Regards, Juha
Re: [Qemu-devel] [PATCH 09/11] Cocoa: Shutdown when window is closed
On Dec 13, 2009, at 04:55, ext Andreas Färber wrote: The application is not very useful once the guest window is closed. QEMU is not a document-based application; terminating it automatically saves the user another action and resembles SDL behavior. Signed-off-by: Andreas Färber andreas.faer...@web.de Cc: Mike Kronenberg mike.kronenb...@kronenberg.org Cc: Alexander Graf a...@csgraf.de --- cocoa.m |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/cocoa.m b/cocoa.m index 70c249b..09ed3cd 100644 --- a/cocoa.m +++ b/cocoa.m @@ -712,6 +712,7 @@ static int cocoa_keycode_to_qemu(int keycode) - (void)toggleFullScreen:(id)sender; - (void)showQEMUDoc:(id)sender; - (void)showQEMUTec:(id)sender; +- (void)windowWillClose:(NSNotification *)notification; @end @implementation QemuCocoaAppController @@ -737,6 +738,7 @@ static int cocoa_keycode_to_qemu(int keycode) fprintf(stderr, (cocoa) can't create window\n); exit(1); } +[normalWindow setDelegate:self]; [normalWindow setAcceptsMouseMovedEvents:YES]; [normalWindow setTitle:[NSString stringWithFormat:@QEMU]]; [normalWindow setContentView:cocoaView]; @@ -835,6 +837,11 @@ static int cocoa_keycode_to_qemu(int keycode) [[NSWorkspace sharedWorkspace] openFile:[NSString stringWithFormat:@%@/../doc/qemu/qemu-tech.html, [[NSBundle mainBundle] resourcePath]] withApplication:@Help Viewer]; } + +- (void)windowWillClose:(NSNotification *)notification +{ + [NSApp terminate:self]; +} @end Another way to achieve the same thing is to instead of the above changes just introduce a new method for the QemuCocoaAppController like: - (BOOL)applicationShouldTerminateAfterLastWindowClosed:(NSApplication *)theApplication { return YES; } Regards, Juha
Re: [Qemu-devel] [PATCH] target-arm: tcg temp variable usage cleanup
On Nov 1, 2009, at 02:08, ext Laurent Desnogues wrote: On Thu, Oct 29, 2009 at 3:01 PM, juha.riihim...@nokia.com wrote: From: Juha Riihimäki juha.riihim...@nokia.com TCG temporary variable handling in target-arm/translate.c is currently somewhat inconsistent; some functions allocate new temporaries that the calling function is expected to free and some other functions free temporaries that are passed in as parameters. This patch will remove all such instances in the code and make the lifespan of the temporaries more clearly visible as they are always allocated and freed within one function. The only exception to this are the global temporaries allocated in the beginning of the gen_intermediate_code_internal function. [...] I tested your patch by running translate + TCG code gen for all of the opcodes in the range e000-. For the NEON instructions I had to add correct undefined detection to let my program process the range (OTOH I didn't bother fixing the wrong decoding and/or codegen, I was just doing sanity check on your patch). Next step is to also do that for Thumb2. And then run some real programs. Thanks for your work so far. I fixed the things you pointed out, but I'll hold submitting a new version of the patch until you have had time to do more testing. I tested with the n810 system emulation, it was working fine. Regards, Juha
Re: [Qemu-devel] [PATCH 00/19 v2] Add virtio-net/tap support for partial csums and GSO
On Oct 22, 2009, at 19:43, ext Mark McLoughlin wrote: Hey, Over a year ago we added some code to qemu-kvm.git which takes advantage of the recent tun/tap IFF_VNET_HDR feature in order to allow virtio-net to send and receive packets with partial checksums and segmentation offloaded: http://article.gmane.org/gmane.comp.emulators.kvm.devel/20440 This allows us to pass larger packets and packets with partial checkums between the guest and the host, greatly increasing the achievable bandwidth. Unfortunately, that implementation was quite hacky as it made some assumptions that would break if e.g. you added another network client to a vlan where the feature had enabled. Now that we have the -netdev parameter, we can more safely pair the NIC and backend, allowing us to negatiate features like this. What follows is a somewhat cleaned up version of the code from qemu-kvm.git. Further cleanups are probably possible, but I think this much is mergeable. Some points of discussion: I think this patch set has broken the QEMU build on OS X. The reason is that there are some definitions in tap-linux.h which only gets included in net.c if __linux__ is defined but the other parts of net.c which utilize definitions from tap-linux.h get compiled in nevertheless. It compiles if all definitions (except for the extra includes) from tap-linux.h are included for OS X as well. I've attached below a sample of the compiler output. Regards, Juha --- [...] CCnet.o net.c: In function ‘tap_receive_iov’: net.c:1391: error: variable ‘hdr’ has initializer but incomplete type net.c:1391: warning: excess elements in struct initializer net.c:1391: warning: (near initialization for ‘hdr’) net.c:1391: error: storage size of ‘hdr’ isn’t known net.c:1391: warning: unused variable ‘hdr’ net.c: In function ‘tap_receive_raw’: net.c:1409: error: variable ‘hdr’ has initializer but incomplete type net.c:1409: warning: excess elements in struct initializer net.c:1409: warning: (near initialization for ‘hdr’) net.c:1409: error: storage size of ‘hdr’ isn’t known net.c:1409: warning: unused variable ‘hdr’ net.c: In function ‘tap_send’: net.c:1484: error: invalid application of ‘sizeof’ to incomplete type ‘struct virtio_net_hdr’ net.c:1485: error: invalid application of ‘sizeof’ to incomplete type ‘struct virtio_net_hdr’ net.c: In function ‘tap_set_sndbuf’: net.c:1511: error: ‘TUNSETSNDBUF’ undeclared (first use in this function) net.c:1511: error: (Each undeclared identifier is reported only once net.c:1511: error: for each function it appears in.) net.c: In function ‘tap_probe_vnet_hdr’: net.c:1552: error: ‘TUNGETIFF’ undeclared (first use in this function) net.c:1557: error: ‘IFF_VNET_HDR’ undeclared (first use in this function) net.c: In function ‘tap_set_offload’: net.c:1567: error: ‘TUN_F_CSUM’ undeclared (first use in this function) net.c:1569: error: ‘TUN_F_TSO4’ undeclared (first use in this function) net.c:1571: error: ‘TUN_F_TSO6’ undeclared (first use in this function) net.c:1573: error: ‘TUN_F_TSO_ECN’ undeclared (first use in this function) net.c:1575: error: ‘TUN_F_UFO’ undeclared (first use in this function) net.c:1578: error: ‘TUNSETOFFLOAD’ undeclared (first use in this function) net.c: In function ‘net_tap_fd_init’: net.c:1623: error: ‘TUN_F_CSUM’ undeclared (first use in this function) net.c:1623: error: ‘TUN_F_UFO’ undeclared (first use in this function) net.c:1624: error: ‘TUNSETOFFLOAD’ undeclared (first use in this function) make[1]: *** [net.o] Error 1 make: *** [build-all] Error 2
Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops
On Oct 26, 2009, at 23:05, ext Aurelien Jarno wrote: On Mon, Oct 26, 2009 at 10:11:07AM +0100, Laurent Desnogues wrote: On Mon, Oct 26, 2009 at 8:46 AM, juha.riihim...@nokia.com wrote: On Oct 25, 2009, at 16:01, ext Laurent Desnogues wrote: I don't really like the idea of having tcg_qemu_ld/st not factored in some place, as it makes memory access tracing extensions more intrusive. This brings us back to the problem having function freeing tmps. In that case, you could perhaps create a gen_st16_dont_free function as a temporary workaround? I could, but it is getting ugly =/ How about if I create another patch that moves the temporary variable (de)allocation outside the gen_ldx/ gen_stx functions and then refactor this patch on top of that? I'd personally like this, but I guess you'd better wait for some inputs from other QEMU dev's before doing it. Looks like the best option to me. Alrighty then, I did the patch against the latest git and it's rather large... but seems to have broken nothing at least in my testing. The patch will remove all implicit tcg temp variable allocation and deallocation in target-arm/translate.c and make it the responsibility of the calling function. At the same time I also removed the new_tmp and dead_tmp functions completely because I see no point in only tracking some of the 32bit temp variables instead of everything. Personally I think the patch makes reading and understanding (and why not also writing) the file much easier. I do also have a version that has a compile time option of substituting the tcg temp variable alloc/ dealloc function calls with inline functions that track the usage but this is not included with the patch. I'll send the patch shortly, should be nice bed-time reading I guess ;) Please comment when you have free time to read it through... Cheers, Juha
Re: [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm load/store multiple ops
On Oct 27, 2009, at 11:00, ext Aurelien Jarno wrote: juha.riihim...@nokia.com a écrit : On Oct 27, 2009, at 10:39, ext Aurelien Jarno wrote: On Sat, Oct 24, 2009 at 03:19:04PM +0300, juha.riihim...@nokia.com wrote: From: Juha Riihimäki juha.riihim...@nokia.com RM load/store multiple instructions can be slightly optimized by loading the register offset constant into a variable outside the register loop and using the preloaded variable inside the loop instead of reloading the offset value to a temporary variable on each loop iteration. This causes less TCG ops to be generated for a ARM load/ store multiple instruction if there are more than one register accessed, otherwise the number of generated TCG ops is the same. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com Acked-by: Laurent Desnogues laurent.desnog...@gmail.com This patch breaks, the boot of an arm kernel, as tmp2 is used elsewhere within this code path. True, I just noticed that as well. This is because the resource leak patch was refactored to utilize tmp2 inside the loop as well. I just sent a new revision of this patch that uses tmp3 for th constant value. OTOH, while it reduce the number of TCG ops, that should not impact the generated host asm code, as most (all ?) targets are able to add a small constant value to a register in one instruction. This is true, but I still think it provides a small speed gain as there are less TCG ops to be processed when generating host code...? It means less TCG ops, but one more temp variable, therefore if there is a gain, I don't think it is something even measurable. OTOH it makes the code a bit more complex to read. I am not really opposed to this patch (and the other patches of the same kind), but I will need some more arguments to convince me. Shouldn't the amount of temp variables stay the same? tcg_gen_addi_i32 will internally allocate a temporary variable to hold the integer parameter. The only difference is that the temporary stays alive during the loop instead of being allocated/deallocated during each iteration. But I agree, the performance gain is probably quite small. Regards, Juha
Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops
On Oct 25, 2009, at 16:01, ext Laurent Desnogues wrote: I don't really like the idea of having tcg_qemu_ld/st not factored in some place, as it makes memory access tracing extensions more intrusive. This brings us back to the problem having function freeing tmps. In that case, you could perhaps create a gen_st16_dont_free function as a temporary workaround? I could, but it is getting ugly =/ How about if I create another patch that moves the temporary variable (de)allocation outside the gen_ldx/ gen_stx functions and then refactor this patch on top of that? Regards, Juha
Re: [Qemu-devel] [PATCH v2][RESEND] target-arm: cleanup internal resource leaks
On Oct 26, 2009, at 12:46, ext Laurent Desnogues wrote: @@ -5511,8 +5539,9 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) tcg_gen_movi_i32(tmp, 0); } tmp2 = neon_load_reg(rm, 0); -gen_helper_neon_tbl(tmp2, tmp2, tmp, tcg_const_i32 (rn), -tcg_const_i32(n)); +tmp4 = tcg_const_i32(rn); +tmp5 = tcg_const_i32(n); +gen_helper_neon_tbl(tmp2, tmp2, tmp, tmp4, tmp5); dead_tmp(tmp); if (insn (1 6)) { tmp = neon_load_reg(rd, 1); @@ -5521,8 +5550,9 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) tcg_gen_movi_i32(tmp, 0); } tmp3 = neon_load_reg(rm, 1); -gen_helper_neon_tbl(tmp3, tmp3, tmp, tcg_const_i32 (rn), -tcg_const_i32(n)); +gen_helper_neon_tbl(tmp3, tmp3, tmp, tmp4, tmp5); +dead_tmp(tmp5); +dead_tmp(tmp4); I missed this mistake when I acked the patch: you're using dead_tmp instead of tcg_temp_free... One more reason to drop dead_tmp :-) Indeed. I'll fix it, didn't notice it either because I'm using other new_xxx/dead_xxx helpers myself to catch resource leaks. I should work around that to get more similar code and eliminate this kind of issues. The bug will cause false resource leak reports, the actual freeing of the temporary is done correctly. Since this patch seems to have already been applied, I'll just provide a patch that fixes this specific error in the mainline instead of reposting a fixed version of the original patch. Regards, Juha
Re: [Qemu-devel] [PATCH] Add support for multiple simultaneously used keyboard devices.
On Oct 24, 2009, at 18:09, ext Filip Navara wrote: The support for multiple keyboard devices is essential for emulating embedded boards where multiple input devices are present (eg. keypad and rotary encoder) which are implemented using separate QEMU devices. Nice to see someone else needs this as well. I just did a very similar implementation a short while ago. Regards, Juha
Re: [Qemu-devel] [PATCH 10/12] target-arm: replace tcg_gen_rori_i32 by tcg_gen_rotri_i32
On Oct 23, 2009, at 18:25, ext Aurelien Jarno wrote: On Wed, Oct 21, 2009 at 12:18:08PM +0200, juha.riihim...@nokia.com wrote: Use native rotation if possible instead of a simulated one. I have another patch in my local tree that handle more cases: Great, I'll drop it from my patch series then. Cheers, Juha
[Qemu-devel] [PATCH v2] target-arm: cleanup internal resource leaks
Revised patch for getting rid of tcg temporary variable leaks in target-arm/translate.c. This version also includes the leak patch for gen_set_cpsr macro, now converted as a static inline function, which I sent earlier as a separate patch on top of this patch. This patch (the attached version) should apply cleanly against the latest mainline git. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index e56082b..813f661 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -184,7 +184,12 @@ static void store_reg(DisasContext *s, int reg, TCGv var) #define gen_uxtb16(var) gen_helper_uxtb16(var, var) -#define gen_set_cpsr(var, mask) gen_helper_cpsr_write(var, tcg_const_i32(mask)) +static inline void gen_set_cpsr(TCGv var, uint32_t mask) +{ +TCGv tmp_mask = tcg_const_i32(mask); +gen_helper_cpsr_write(var, tmp_mask); +tcg_temp_free_i32(tmp_mask); +} /* Set NZCV flags from the high 4 bits of var. */ #define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV) @@ -287,6 +292,7 @@ static TCGv_i64 gen_mulu_i64_i32(TCGv a, TCGv b) tcg_gen_extu_i32_i64(tmp2, b); dead_tmp(b); tcg_gen_mul_i64(tmp1, tmp1, tmp2); +tcg_temp_free_i64(tmp2); return tmp1; } @@ -300,6 +306,7 @@ static TCGv_i64 gen_muls_i64_i32(TCGv a, TCGv b) tcg_gen_ext_i32_i64(tmp2, b); dead_tmp(b); tcg_gen_mul_i64(tmp1, tmp1, tmp2); +tcg_temp_free_i64(tmp2); return tmp1; } @@ -312,9 +319,11 @@ static void gen_mull(TCGv a, TCGv b) tcg_gen_extu_i32_i64(tmp1, a); tcg_gen_extu_i32_i64(tmp2, b); tcg_gen_mul_i64(tmp1, tmp1, tmp2); +tcg_temp_free_i64(tmp2); tcg_gen_trunc_i64_i32(a, tmp1); tcg_gen_shri_i64(tmp1, tmp1, 32); tcg_gen_trunc_i64_i32(b, tmp1); +tcg_temp_free_i64(tmp1); } /* Signed 32x32-64 multiply. */ @@ -326,9 +335,11 @@ static void gen_imull(TCGv a, TCGv b) tcg_gen_ext_i32_i64(tmp1, a); tcg_gen_ext_i32_i64(tmp2, b); tcg_gen_mul_i64(tmp1, tmp1, tmp2); +tcg_temp_free_i64(tmp2); tcg_gen_trunc_i64_i32(a, tmp1); tcg_gen_shri_i64(tmp1, tmp1, 32); tcg_gen_trunc_i64_i32(b, tmp1); +tcg_temp_free_i64(tmp1); } /* Swap low and high halfwords. */ @@ -542,11 +553,13 @@ static void gen_arm_parallel_addsub(int op1, int op2, TCGv a, TCGv b) tmp = tcg_temp_new_ptr(); tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE)); PAS_OP(s) +tcg_temp_free_ptr(tmp); break; case 5: tmp = tcg_temp_new_ptr(); tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE)); PAS_OP(u) +tcg_temp_free_ptr(tmp); break; #undef gen_pas_helper #define gen_pas_helper(name) glue(gen_helper_,name)(a, a, b) @@ -587,11 +600,13 @@ static void gen_thumb2_parallel_addsub(int op1, int op2, TCGv a, TCGv b) tmp = tcg_temp_new_ptr(); tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE)); PAS_OP(s) +tcg_temp_free_ptr(tmp); break; case 4: tmp = tcg_temp_new_ptr(); tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE)); PAS_OP(u) +tcg_temp_free_ptr(tmp); break; #undef gen_pas_helper #define gen_pas_helper(name) glue(gen_helper_,name)(a, a, b) @@ -995,10 +1010,12 @@ static inline void gen_vfp_tosiz(int dp) #define VFP_GEN_FIX(name) \ static inline void gen_vfp_##name(int dp, int shift) \ { \ +TCGv tmp_shift = tcg_const_i32(shift); \ if (dp) \ -gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tcg_const_i32 (shift), cpu_env);\ +gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tmp_shift, cpu_env);\ else \ -gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tcg_const_i32 (shift), cpu_env);\ +gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tmp_shift, cpu_env);\ +tcg_temp_free_i32(tmp_shift); \ } VFP_GEN_FIX(tosh) VFP_GEN_FIX(tosl) @@ -2399,7 +2416,7 @@ static int disas_dsp_insn(CPUState *env, DisasContext *s, uint32_t insn) instruction is not defined. */ static int disas_cp_insn(CPUState *env, DisasContext *s, uint32_t insn) { -TCGv tmp; +TCGv tmp, tmp2; uint32_t rd = (insn 12) 0xf; uint32_t cp = (insn 8) 0xf; if (IS_USER(s)) { @@ -2411,14 +2428,18 @@ static int disas_cp_insn(CPUState *env, DisasContext *s, uint32_t insn) return 1; gen_set_pc_im(s-pc); tmp = new_tmp(); -gen_helper_get_cp(tmp, cpu_env, tcg_const_i32(insn)); +tmp2 = tcg_const_i32(insn); +gen_helper_get_cp(tmp, cpu_env, tmp2); +tcg_temp_free(tmp2); store_reg(s, rd, tmp); } else { if (!env-cp[cp].cp_write) return 1; gen_set_pc_im(s-pc); tmp = load_reg(s, rd); -gen_helper_set_cp(cpu_env, tcg_const_i32(insn), tmp); +tmp2 = tcg_const_i32(insn); +
Re: [Qemu-devel] [PATCH 07/12] target-arm: fix neon vsri, vshl and vsli ops
On Oct 21, 2009, at 13:46, ext Laurent Desnogues wrote: @@ -4624,31 +4624,31 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) switch (size) { case 0: if (op == 4) -imm = 0xff -shift; +imm2 = 0xff -shift; else -imm = (uint8_t)(0xff shift); -imm |= imm 8; -imm |= imm 16; +imm2 = (uint8_t)(0xff shift); +imm2 |= imm2 8; +imm2 |= imm2 16; break; case 1: if (op == 4) -imm = 0x -shift; +imm2 = 0x -shift; else -imm = (uint16_t)(0x shift); -imm |= imm 16; +imm2 = (uint16_t)(0x shift); +imm2 |= imm2 16; break; case 2: if (op == 4) -imm = 0xu -shift; +imm2 = 0xu -shift; else -imm = 0xu shift; +imm2 = 0xu shift; break; default: abort(); } tmp2 = neon_load_reg(rd, pass); -tcg_gen_andi_i32(tmp, tmp, imm); -tcg_gen_andi_i32(tmp2, tmp2, ~imm); +tcg_gen_andi_i32(tmp, tmp, imm2); +tcg_gen_andi_i32(tmp2, tmp2, ~imm2); tcg_gen_or_i32(tmp, tmp, tmp2); dead_tmp(tmp2); } I mostly agree with this, except there's a mistake already present in the original code: for a size of 2 the shift amount can be 32 (look at VSRI in the ARM Architecture Reference Manual). Note in this case, shift will be -32. True. However, I'm not sure if this causes incorrect behavior or not. I'm not a NEON expert but how I understand the VSRI instruction is that it will shift the element value before it is inserted in the destination vector, therefore with element size 2 and shift 32 the result would be always zero and I guess that would still apply if the shift was -32: does it matter in which direction you shift if you anyway shift all the bits out? Regards, Juha
Re: [Qemu-devel] [PATCH] target-arm: cleanup internal resource leaks
On Oct 22, 2009, at 08:40, Riihimaki Juha (Nokia-D/Helsinki) wrote: @@ -4676,12 +4694,16 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) gen_neon_narrow_satu(size - 1, tmp, cpu_V0); } if (pass == 0) { +dead_tmp(tmp2); This looks wrong if size == 3 since we have TCGV_UNUSED(tmp2). You're right. However, looking at the surrounding code a bit closer I began to wonder if it works correctly at all since tmp2 is used as a shift value if size 2 but it is also used to store the result of the first pass. Therefore the result of the first pass is used as a shift value during second pass... perhaps not correct? Wouldn't it make more sense to store the result of the first pass in the lower half of the destination neon register directly during the first pass? I see no point in keeping it in a temporary variable and only store both halves of the destination neon register during the second pass. Especially since there is no memory access involved, everything is done in registers. What do you think? To make it more clear what I'm after, look at the patch below that changes the code into what I think is correct functionality. The patch applies on top of the v2 resource leak patch which I just sent a short while ago. Cheers, Juha --- diff --git a/target-arm/translate.c b/target-arm/translate.c index 813f661..7598246 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -4696,18 +4696,12 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) else gen_neon_narrow_satu(size - 1, tmp, cpu_V0); } -if (pass == 0) { -if (size != 3) { -dead_tmp(tmp2); -} -tmp2 = tmp; -} else { -neon_store_reg(rd, 0, tmp2); -neon_store_reg(rd, 1, tmp); -} +neon_store_reg(rd, pass, tmp); } /* for pass */ if (size == 3) { tcg_temp_free_i64(tmp64); +} else { +dead_tmp(tmp2); } } else if (op == 10) { /* VSHLL */
Re: [Qemu-devel] [PATCH 07/12] target-arm: fix neon vsri, vshl and vsli ops
On Oct 22, 2009, at 10:18, ext Laurent Desnogues wrote: On Thu, Oct 22, 2009 at 8:49 AM, juha.riihim...@nokia.com wrote: On Oct 21, 2009, at 13:46, ext Laurent Desnogues wrote: @@ -4624,31 +4624,31 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) switch (size) { case 0: if (op == 4) -imm = 0xff -shift; +imm2 = 0xff -shift; else -imm = (uint8_t)(0xff shift); -imm |= imm 8; -imm |= imm 16; +imm2 = (uint8_t)(0xff shift); +imm2 |= imm2 8; +imm2 |= imm2 16; break; case 1: if (op == 4) -imm = 0x -shift; +imm2 = 0x -shift; else -imm = (uint16_t)(0x shift); -imm |= imm 16; +imm2 = (uint16_t)(0x shift); +imm2 |= imm2 16; break; case 2: if (op == 4) -imm = 0xu -shift; +imm2 = 0xu -shift; else -imm = 0xu shift; +imm2 = 0xu shift; break; default: abort(); } tmp2 = neon_load_reg(rd, pass); -tcg_gen_andi_i32(tmp, tmp, imm); -tcg_gen_andi_i32(tmp2, tmp2, ~imm); +tcg_gen_andi_i32(tmp, tmp, imm2); +tcg_gen_andi_i32(tmp2, tmp2, ~imm2); tcg_gen_or_i32(tmp, tmp, tmp2); dead_tmp(tmp2); } I mostly agree with this, except there's a mistake already present in the original code: for a size of 2 the shift amount can be 32 (look at VSRI in the ARM Architecture Reference Manual). Note in this case, shift will be -32. True. However, I'm not sure if this causes incorrect behavior or not. I'm not a NEON expert but how I understand the VSRI instruction is that it will shift the element value before it is inserted in the destination vector, therefore with element size 2 and shift 32 the result would be always zero and I guess that would still apply if the shift was -32: does it matter in which direction you shift if you anyway shift all the bits out? The problem is not in the NEON semantics but rather the C one: the behaviour of a shift by an amount greater than or equal to the bit-width of the type is undefined; in this case imm2 is 32-bit and the shift is 32-bit. What you want is imm2 = 0 if shift is -32, as you correctly guessed. Ah, I see. Doesn't the same issue apply for the 8bit and 16bit element sizes as well? I suppose it will be sufficient to check for a negative shift value and in that case force the mask value to zero. Regards, Juha
Re: [Qemu-devel] [PATCH v2] target-arm: cleanup internal resource leaks
On Oct 22, 2009, at 11:04, ext Gerd Hoffmann wrote: Hi, -#define gen_set_cpsr(var, mask) gen_helper_cpsr_write(var, tcg_const_i32(mask)) Your mailer mangles the patches. Yes, I know, I can't help it, I'm sorry. That's why I include the patch as an attachment as well -- the attachment should stay in working condition I think. I put the patches inline as well to make it possible to look at the patches without the need to separately open the attachment; whether you need to do that or not depends on the mail client, I suppose a client could also show text attachment inline. Regards, Juha
Re: [Qemu-devel] [PATCH 12/12] [RESEND] target-arm: fix neon shift helper functions
On Oct 22, 2009, at 10:57, ext Laurent Desnogues wrote: On Wed, Oct 21, 2009 at 1:01 PM, juha.riihim...@nokia.com wrote: Current code is broken at least on gcc 4.2, the result of a comparison -1 = sizeof(type) * 8 results true and causes wrong code path to be taken. The fix utilizes abs() function where applicable and otherwise adds a test to ensure both arguments are positive before making the aforementioned comparison. Why don't you instead cast sizeof to int? No reason, I'll make the v2 patch series use cast instead. It seems to work just as well. Juha
Re: [Qemu-devel] [PATCH 03/12] target-arm: add support for neon vld1.64 instruction
On Oct 22, 2009, at 11:39, ext Laurent Desnogues wrote: On Wed, Oct 21, 2009 at 12:17 PM, juha.riihim...@nokia.com wrote: Add support for neon vld1.64 instruction. From: Riku Voipio riku.voi...@iki.fi Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index 3ea9d51..d027572 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -795,6 +795,12 @@ static inline TCGv gen_ld32(TCGv addr, int index) tcg_gen_qemu_ld32u(tmp, addr, index); return tmp; } +static inline TCGv_i64 gen_ld64(TCGv addr, int index) +{ +TCGv_i64 tmp = tcg_temp_new_i64(); +tcg_gen_qemu_ld64(tmp, addr, index); +return tmp; +} static inline void gen_st8(TCGv val, TCGv addr, int index) { tcg_gen_qemu_st8(val, addr, index); @@ -810,6 +816,11 @@ static inline void gen_st32(TCGv val, TCGv addr, int index) tcg_gen_qemu_st32(val, addr, index); dead_tmp(val); } +static inline void gen_st64(TCGv_i64 val, TCGv addr, int index) +{ +tcg_gen_qemu_st64(val, addr, index); +tcg_temp_free_i64(val); +} static inline void gen_set_pc_im(uint32_t val) { @@ -3690,6 +3701,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) TCGv addr; TCGv tmp; TCGv tmp2; +TCGv_i64 tmp64; if (!vfp_enabled(env)) return 1; @@ -3702,7 +3714,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) /* Load store all elements. */ op = (insn 8) 0xf; size = (insn 6) 3; -if (op 10 || size == 3) +if (op 10) This is wrong: a size of 3 is limited to vld1.64 and vst1.64 which you don't enforce here. Apart from that, the rest looks OK. Thanks, and you're right of course. I'll add a check that will return 1 if size equals 3 and interleave or spacing is not 1. Cheers, Juha
Re: [Qemu-devel] [PATCH v2] target-arm: cleanup internal resource leaks
On Oct 22, 2009, at 12:10, ext Gerd Hoffmann wrote: On 10/22/09 10:35, juha.riihim...@nokia.com wrote: Hi, -#define gen_set_cpsr(var, mask) gen_helper_cpsr_write(var, tcg_const_i32(mask)) Your mailer mangles the patches. Yes, I know, I can't help it, I'm sorry. Makes it very difficult to extract the patches in a automated way ... Well, I tested that the formatting stays correct if I send the mail body HTML format instead of plain text. I suppose you would not like to read/parse HTML mails in this list, however, or...? Regards, Juha
Re: [Qemu-devel] [PATCH v2] target-arm: cleanup internal resource leaks
On Oct 22, 2009, at 13:08, ext Gerd Hoffmann wrote: On 10/22/09 11:55, juha.riihim...@nokia.com wrote: Your mailer mangles the patches. Yes, I know, I can't help it, I'm sorry. Makes it very difficult to extract the patches in a automated way ... Well, I tested that the formatting stays correct if I send the mail body HTML format instead of plain text. I suppose you would not like to read/parse HTML mails in this list, however, or...? Something git am can handle would be best. git send-email would be perfect if you can get that work. 'git send-email --smtp-server $internalmailserver.nokia.com' could do the trick. That will also send all mails in one thread so it is easy to figure that these mails belong together. Okay, thanks for the suggestion, I'll try that and resend the v2 patch, let's see what happens. Cheers, Juha
[Qemu-devel] [PATCH 00/12] target-arm: miscellaneous fixes
Hi, This patch series includes a number of smaller fixes and improvements to the ARM translator. The series should be applied in sequence as the modifications are all but one related to the same file, target-arm/ translate.c. The whole series should apply cleanly against latest git AFTER applying the resource leak patch which I sent on this list couple of days ago. Cheers, Juha
[Qemu-devel] [PATCH 03/12] target-arm: add support for neon vld1.64 instruction
Add support for neon vld1.64 instruction. From: Riku Voipio riku.voi...@iki.fi Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index 3ea9d51..d027572 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -795,6 +795,12 @@ static inline TCGv gen_ld32(TCGv addr, int index) tcg_gen_qemu_ld32u(tmp, addr, index); return tmp; } +static inline TCGv_i64 gen_ld64(TCGv addr, int index) +{ +TCGv_i64 tmp = tcg_temp_new_i64(); +tcg_gen_qemu_ld64(tmp, addr, index); +return tmp; +} static inline void gen_st8(TCGv val, TCGv addr, int index) { tcg_gen_qemu_st8(val, addr, index); @@ -810,6 +816,11 @@ static inline void gen_st32(TCGv val, TCGv addr, int index) tcg_gen_qemu_st32(val, addr, index); dead_tmp(val); } +static inline void gen_st64(TCGv_i64 val, TCGv addr, int index) +{ +tcg_gen_qemu_st64(val, addr, index); +tcg_temp_free_i64(val); +} static inline void gen_set_pc_im(uint32_t val) { @@ -3690,6 +3701,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) TCGv addr; TCGv tmp; TCGv tmp2; +TCGv_i64 tmp64; if (!vfp_enabled(env)) return 1; @@ -3702,7 +3714,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) /* Load store all elements. */ op = (insn 8) 0xf; size = (insn 6) 3; -if (op 10 || size == 3) +if (op 10) return 1; nregs = neon_ls_element_type[op].nregs; interleave = neon_ls_element_type[op].interleave; @@ -3716,61 +3728,74 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) load_reg_var(s, addr, rn); tcg_gen_addi_i32(addr, addr, 1 size); } -for (pass = 0; pass 2; pass++) { -if (size == 2) { -if (load) { -tmp = gen_ld32(addr, IS_USER(s)); -neon_store_reg(rd, pass, tmp); -} else { -tmp = neon_load_reg(rd, pass); -gen_st32(tmp, addr, IS_USER(s)); -} -tcg_gen_addi_i32(addr, addr, stride); -} else if (size == 1) { -if (load) { -tmp = gen_ld16u(addr, IS_USER(s)); -tcg_gen_addi_i32(addr, addr, stride); -tmp2 = gen_ld16u(addr, IS_USER(s)); -tcg_gen_addi_i32(addr, addr, stride); -gen_bfi(tmp, tmp, tmp2, 16, 0x); -dead_tmp(tmp2); -neon_store_reg(rd, pass, tmp); -} else { -tmp = neon_load_reg(rd, pass); -tmp2 = new_tmp(); -tcg_gen_shri_i32(tmp2, tmp, 16); -gen_st16(tmp, addr, IS_USER(s)); -tcg_gen_addi_i32(addr, addr, stride); -gen_st16(tmp2, addr, IS_USER(s)); +if (size == 3) { +if (load) { +tmp64 = gen_ld64(addr, IS_USER(s)); +neon_store_reg64(tmp64, rd); +tcg_temp_free_i64(tmp64); +} else { +tmp64 = tcg_temp_new_i64(); +neon_load_reg64(tmp64, rd); +gen_st64(tmp64, addr, IS_USER(s)); +} +tcg_gen_addi_i32(addr, addr, stride); +} else { +for (pass = 0; pass 2; pass++) { +if (size == 2) { +if (load) { +tmp = gen_ld32(addr, IS_USER(s)); +neon_store_reg(rd, pass, tmp); +} else { +tmp = neon_load_reg(rd, pass); +gen_st32(tmp, addr, IS_USER(s)); +} tcg_gen_addi_i32(addr, addr, stride); -} -} else /* size == 0 */ { -if (load) { -TCGV_UNUSED(tmp2); -for (n = 0; n 4; n++) { -tmp = gen_ld8u(addr, IS_USER(s)); +} else if (size == 1) { +if (load) { +tmp = gen_ld16u(addr, IS_USER(s)); +tcg_gen_addi_i32(addr, addr, stride); +tmp2 = gen_ld16u(addr, IS_USER(s)); +tcg_gen_addi_i32(addr, addr, stride); +gen_bfi(tmp, tmp, tmp2, 16, 0x); +dead_tmp(tmp2); +neon_store_reg(rd, pass, tmp); +} else { +
[Qemu-devel] [PATCH 02/12] target-arm: optimize thumb 32bit multiply
Current implementation of thumb mul instruction is implemented as a 32x32-64 multiply which then uses only 32 least significant bits of the result. Replace that with a simple 32x32-32 multiply. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index bda105e..3ea9d51 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -310,22 +310,6 @@ static TCGv_i64 gen_muls_i64_i32(TCGv a, TCGv b) return tmp1; } -/* Unsigned 32x32-64 multiply. */ -static void gen_mull(TCGv a, TCGv b) -{ -TCGv_i64 tmp1 = tcg_temp_new_i64(); -TCGv_i64 tmp2 = tcg_temp_new_i64(); - -tcg_gen_extu_i32_i64(tmp1, a); -tcg_gen_extu_i32_i64(tmp2, b); -tcg_gen_mul_i64(tmp1, tmp1, tmp2); -tcg_temp_free_i64(tmp2); -tcg_gen_trunc_i64_i32(a, tmp1); -tcg_gen_shri_i64(tmp1, tmp1, 32); -tcg_gen_trunc_i64_i32(b, tmp1); -tcg_temp_free_i64(tmp1); -} - /* Signed 32x32-64 multiply. */ static void gen_imull(TCGv a, TCGv b) { @@ -8358,7 +8342,7 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s) gen_logic_CC(tmp); break; case 0xd: /* mul */ -gen_mull(tmp, tmp2); +tcg_gen_mul_i32(tmp, tmp, tmp2); if (!s-condexec_mask) gen_logic_CC(tmp); break; translate.c.genmull.diff Description: translate.c.genmull.diff
[Qemu-devel] [PATCH 07/12] target-arm: fix neon vsri, vshl and vsli ops
Shift immediate value is incorrectly overwritten by a temporary variable in the processing of NEON vsri, vshl and vsli instructions. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index 59bf7bc..c92ecc6 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -4094,7 +4094,7 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) int pairwise; int u; int n; -uint32_t imm; +uint32_t imm, imm2; TCGv tmp; TCGv tmp2; TCGv tmp3; @@ -4624,31 +4624,31 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) switch (size) { case 0: if (op == 4) -imm = 0xff -shift; +imm2 = 0xff -shift; else -imm = (uint8_t)(0xff shift); -imm |= imm 8; -imm |= imm 16; +imm2 = (uint8_t)(0xff shift); +imm2 |= imm2 8; +imm2 |= imm2 16; break; case 1: if (op == 4) -imm = 0x -shift; +imm2 = 0x -shift; else -imm = (uint16_t)(0x shift); -imm |= imm 16; +imm2 = (uint16_t)(0x shift); +imm2 |= imm2 16; break; case 2: if (op == 4) -imm = 0xu -shift; +imm2 = 0xu -shift; else -imm = 0xu shift; +imm2 = 0xu shift; break; default: abort(); } tmp2 = neon_load_reg(rd, pass); -tcg_gen_andi_i32(tmp, tmp, imm); -tcg_gen_andi_i32(tmp2, tmp2, ~imm); +tcg_gen_andi_i32(tmp, tmp, imm2); +tcg_gen_andi_i32(tmp2, tmp2, ~imm2); tcg_gen_or_i32(tmp, tmp, tmp2); dead_tmp(tmp2); } translate.c.neonimm.diff Description: translate.c.neonimm.diff
[Qemu-devel] [PATCH 10/12] target-arm: replace tcg_gen_rori_i32 by tcg_gen_rotri_i32
Use native rotation if possible instead of a simulated one. From: Filip Navara filip.nav...@gmail.com Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index 99a9ffd..1734fae 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -418,21 +418,6 @@ static inline void tcg_gen_bic_i32(TCGv dest, TCGv t0, TCGv t1) /* FIXME: Implement this natively. */ #define tcg_gen_abs_i32(t0, t1) gen_helper_abs(t0, t1) -/* FIXME: Implement this natively. */ -static void tcg_gen_rori_i32(TCGv t0, TCGv t1, int i) -{ -TCGv tmp; - -if (i == 0) -return; - -tmp = new_tmp(); -tcg_gen_shri_i32(tmp, t1, i); -tcg_gen_shli_i32(t1, t1, 32 - i); -tcg_gen_or_i32(t0, t1, tmp); -dead_tmp(tmp); -} - static void shifter_out_im(TCGv var, int shift) { TCGv tmp = new_tmp(); @@ -484,7 +469,7 @@ static inline void gen_arm_shift_im(TCGv var, int shiftop, int shift, int flags) if (shift != 0) { if (flags) shifter_out_im(var, shift - 1); -tcg_gen_rori_i32(var, var, shift); break; +tcg_gen_rotri_i32(var, var, shift); break; } else { TCGv tmp = load_cpu_field(CF); if (flags) @@ -6634,7 +6619,7 @@ static void disas_arm_insn(CPUState * env, DisasContext *s) /* ??? In many cases it's not neccessary to do a rotate, a shift is sufficient. */ if (shift != 0) -tcg_gen_rori_i32(tmp, tmp, shift * 8); +tcg_gen_rotri_i32(tmp, tmp, shift * 8); op1 = (insn 20) 7; switch (op1) { case 0: gen_sxtb16(tmp); break; @@ -7451,7 +7436,7 @@ static int disas_thumb2_insn(CPUState *env, DisasContext *s, uint16_t insn_hw1) /* ??? In many cases it's not neccessary to do a rotate, a shift is sufficient. */ if (shift != 0) -tcg_gen_rori_i32(tmp, tmp, shift * 8); +tcg_gen_rotri_i32(tmp, tmp, shift * 8); op = (insn 20) 7; switch (op) { case 0: gen_sxth(tmp); break; translate.c.rori.diff Description: translate.c.rori.diff
[Qemu-devel] [PATCH 01/12] target-arm: fix resource leak in gen_set_cpsr macro
Current implementation of the gen_set_cpsr macro creates a new temporary tcg variable through the tcg_const_i32 call but never marks it dead. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index bc51bcb..bda105e 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -184,7 +184,12 @@ static void store_reg(DisasContext *s, int reg, TCGv var) #define gen_uxtb16(var) gen_helper_uxtb16(var, var) -#define gen_set_cpsr(var, mask) gen_helper_cpsr_write(var, tcg_const_i32(mask)) +#define gen_set_cpsr(var, mask) \ +{ \ +TCGv tmp_mask = tcg_const_i32(mask); \ +gen_helper_cpsr_write(var, tmp_mask); \ +tcg_temp_free_i32(tmp_mask); \ +} /* Set NZCV flags from the high 4 bits of var. */ #define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV) translate.c.gensetcpsr.diff Description: translate.c.gensetcpsr.diff
[Qemu-devel] [PATCH 11/12] target-arm: optimize neon vld/vst ops
Reduce the amount of tcg ops generated from NEON vld/vst instructions by simplifying the code generation. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index 1734fae..fa03df8 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -3692,6 +3692,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) TCGv tmp; TCGv tmp2; TCGv_i64 tmp64; +TCGv stride_v; if (!vfp_enabled(env)) return 1; @@ -3710,6 +3711,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) interleave = neon_ls_element_type[op].interleave; load_reg_var(s, addr, rn); stride = (1 size) * interleave; +stride_v = tcg_const_i32(stride); for (reg = 0; reg nregs; reg++) { if (interleave 2 || (interleave == 2 nregs == 2)) { load_reg_var(s, addr, rn); @@ -3728,7 +3730,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) neon_load_reg64(tmp64, rd); gen_st64(tmp64, addr, IS_USER(s)); } -tcg_gen_addi_i32(addr, addr, stride); +tcg_gen_add_i32(addr, addr, stride_v); } else { for (pass = 0; pass 2; pass++) { if (size == 2) { @@ -3739,58 +3741,57 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) tmp = neon_load_reg(rd, pass); gen_st32(tmp, addr, IS_USER(s)); } -tcg_gen_addi_i32(addr, addr, stride); +tcg_gen_add_i32(addr, addr, stride_v); } else if (size == 1) { if (load) { tmp = gen_ld16u(addr, IS_USER(s)); tcg_gen_addi_i32(addr, addr, stride); tmp2 = gen_ld16u(addr, IS_USER(s)); -tcg_gen_addi_i32(addr, addr, stride); -gen_bfi(tmp, tmp, tmp2, 16, 0x); +tcg_gen_add_i32(addr, addr, stride_v); +tcg_gen_shli_i32(tmp2, tmp2, 16); +tcg_gen_or_i32(tmp, tmp, tmp2); dead_tmp(tmp2); neon_store_reg(rd, pass, tmp); } else { tmp = neon_load_reg(rd, pass); -tmp2 = new_tmp(); -tcg_gen_shri_i32(tmp2, tmp, 16); -gen_st16(tmp, addr, IS_USER(s)); -tcg_gen_addi_i32(addr, addr, stride); -gen_st16(tmp2, addr, IS_USER(s)); -tcg_gen_addi_i32(addr, addr, stride); +tcg_gen_qemu_st16(tmp, addr, IS_USER(s)); +tcg_gen_add_i32(addr, addr, stride_v); +tcg_gen_shri_i32(tmp, tmp, 16); +tcg_gen_qemu_st16(tmp, addr, IS_USER(s)); +tcg_gen_add_i32(addr, addr, stride_v); +dead_tmp(tmp); } } else /* size == 0 */ { if (load) { -TCGV_UNUSED(tmp2); -for (n = 0; n 4; n++) { -tmp = gen_ld8u(addr, IS_USER(s)); -tcg_gen_addi_i32(addr, addr, stride); -if (n == 0) { -tmp2 = tmp; -} else { -gen_bfi(tmp2, tmp2, tmp, n * 8, 0xff); -dead_tmp(tmp); -} +tmp = gen_ld8u(addr, IS_USER(s)); +tcg_gen_add_i32(addr, addr, stride_v); +for (n = 1; n 4; n++) { +tmp2 = gen_ld8u(addr, IS_USER(s)); +tcg_gen_add_i32(addr, addr, stride_v); +tcg_gen_shli_i32(tmp2, tmp2, n * 8); +tcg_gen_or_i32(tmp, tmp, tmp2); +dead_tmp(tmp2); } -neon_store_reg(rd, pass, tmp2); +neon_store_reg(rd, pass, tmp); } else { -tmp2 = neon_load_reg(rd, pass); -for (n = 0; n 4; n++) { -tmp = new_tmp(); -if (n == 0) { -
[Qemu-devel] [PATCH 04/12] target-arm: allow modifying vfp fpexc en bit only
All other bits except for the EN in the VFP FPEXC register are defined as subarchitecture specific and real functionality for any of the other bits has not been implemented in QEMU. However, current code allows modifying all bits in the VFP FPEXC register leading to problems when guest code is writing 1's to the subarchitecture specific bits and checking whether the bits stay up to verify the existence of functionality which in fact does not exist in QEMU. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index d027572..07ee638 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -2788,6 +2788,9 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn) case ARM_VFP_FPEXC: if (IS_USER(s)) return 1; +/* TODO: VFP subarchitecture support. + * For now, keep the EN bit only */ +tcg_gen_andi_i32(tmp, tmp, 1 30); store_cpu_field(tmp, vfp.xregs[rn]); gen_lookup_tb(s); break; translate.c.vfpfpexc.diff Description: translate.c.vfpfpexc.diff
[Qemu-devel] [PATCH 12/12] target-arm: fix neon shift helper functions
Current code is broken at least on gcc 4.2, the result of a comparison -1 = sizeof(type) * 8 results true and causes wrong code path to be taken. The fix utilizes abs() function where applicable and otherwise adds a test to ensure both arguments are positive before making the aforementioned comparison. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index f32ecd6..0c95035 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -392,7 +392,7 @@ NEON_VOP(abd_u32, neon_u32, 1) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8 || tmp = -sizeof(src1) * 8) { \ +if (abs(tmp) = sizeof(src1) * 8) { \ dest = 0; \ } else if (tmp 0) { \ dest = src1 -tmp; \ @@ -420,7 +420,7 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8) { \ +if (tmp = 0 tmp = sizeof(src1) * 8) { \ dest = 0; \ } else if (tmp = -sizeof(src1) * 8) { \ dest = src1 (sizeof(src1) * 8 - 1); \ @@ -453,7 +453,7 @@ uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8) { \ +if (tmp = 0 tmp = sizeof(src1) * 8) { \ dest = 0; \ } else if (tmp -sizeof(src1) * 8) { \ dest = src1 (sizeof(src1) * 8 - 1); \ @@ -494,7 +494,7 @@ uint64_t HELPER(neon_rshl_s64)(uint64_t valop, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8 || tmp -sizeof(src1) * 8) { \ +if (abs(tmp) = sizeof(src1) * 8) { \ dest = 0; \ } else if (tmp == -sizeof(src1) * 8) { \ dest = src1 (tmp - 1); \ @@ -528,7 +528,7 @@ uint64_t HELPER(neon_rshl_u64)(uint64_t val, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8) { \ +if (tmp = 0 tmp = sizeof(src1) * 8) { \ if (src1) { \ SET_QC(); \ dest = ~0; \ @@ -579,7 +579,7 @@ uint64_t HELPER(neon_qshl_u64)(CPUState *env, uint64_t val, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8) { \ +if (tmp = 0 tmp = sizeof(src1) * 8) { \ if (src1) \ SET_QC(); \ dest = src1 31; \ neon_helper.c.diff Description: neon_helper.c.diff
[Qemu-devel] [PATCH 09/12] target-arm: optimize thumb push/pop ops
Thumb push/pop instructions can be slightly optimized by loading the register offset constant into a variable outside the register loop and using the preloaded variable inside the loop instead of reloading the offset value to a temporary variable on each loop iteration. This causes less TCG ops to be generated for a Thumb push/pop instruction. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index abb3105..99a9ffd 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -8589,6 +8589,7 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s) if ((insn (1 11)) == 0) { tcg_gen_addi_i32(addr, addr, -offset); } +tmp2 = tcg_const_i32(4); for (i = 0; i 8; i++) { if (insn (1 i)) { if (insn (1 11)) { @@ -8601,7 +8602,7 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s) gen_st32(tmp, addr, IS_USER(s)); } /* advance to the next address. */ -tcg_gen_addi_i32(addr, addr, 4); +tcg_gen_add_i32(addr, addr, tmp2); } } TCGV_UNUSED(tmp); @@ -8616,8 +8617,9 @@ static void disas_thumb_insn(CPUState *env, DisasContext *s) tmp = load_reg(s, 14); gen_st32(tmp, addr, IS_USER(s)); } -tcg_gen_addi_i32(addr, addr, 4); +tcg_gen_add_i32(addr, addr, tmp2); } +tcg_temp_free_i32(tmp2); if ((insn (1 11)) == 0) { tcg_gen_addi_i32(addr, addr, -offset); } translate.c.tpushpop.diff Description: translate.c.tpushpop.diff
[Qemu-devel] [PATCH 08/12] target-arm: optimize thumb2 load/store multiple ops
Thumb2 load/store multiple instructions can be slightly optimized by loading the register offset constant into a variable outside the register loop and using the preloaded variable inside the loop instead of reloading the offset value to a temporary variable on each loop iteration. This causes less TCG ops to be generated for a Thumb2 load/ store multiple instruction. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index c92ecc6..abb3105 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -7370,6 +7370,7 @@ static int disas_thumb2_insn(CPUState *env, DisasContext *s, uint16_t insn_hw1) tcg_gen_addi_i32(addr, addr, -offset); } +tmp2 = tcg_const_i32(4); for (i = 0; i 16; i++) { if ((insn (1 i)) == 0) continue; @@ -7386,8 +7387,9 @@ static int disas_thumb2_insn(CPUState *env, DisasContext *s, uint16_t insn_hw1) tmp = load_reg(s, i); gen_st32(tmp, addr, IS_USER(s)); } -tcg_gen_addi_i32(addr, addr, 4); +tcg_gen_add_i32(addr, addr, tmp2); } +tcg_temp_free_i32(tmp2); if (insn (1 21)) { /* Base register writeback. */ if (insn (1 24)) { translate.c.t2ldmstm.diff Description: translate.c.t2ldmstm.diff
[Qemu-devel] [PATCH 05/12] target-arm: optimize vfp load/store multiple ops
VFP load/store multiple instructions can be slightly optimized by loading the register offset constant into a variable outside the register loop and using the preloaded variable inside the loop instead of reloading the offset value to a temporary variable on each loop iteration. This causes less TCG ops to be generated for a VFP load/ store multiple instruction. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index 07ee638..e5a2881 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -3222,6 +3222,7 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn) offset = 8; else offset = 4; +tmp = tcg_const_i32(offset); for (i = 0; i n; i++) { if (insn ARM_CP_RW_BIT) { /* load */ @@ -3232,8 +3233,9 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn) gen_mov_F0_vreg(dp, rd + i); gen_vfp_st(s, dp, addr); } -tcg_gen_addi_i32(addr, addr, offset); +tcg_gen_add_i32(addr, addr, tmp); } +tcg_temp_free_i32(tmp); if (insn (1 21)) { /* writeback */ if (insn (1 24)) translate.c.vfpldmstm.diff Description: translate.c.vfpldmstm.diff
[Qemu-devel] [PATCH 12/12] [RESEND] target-arm: fix neon shift helper functions
Current code is broken at least on gcc 4.2, the result of a comparison -1 = sizeof(type) * 8 results true and causes wrong code path to be taken. The fix utilizes abs() function where applicable and otherwise adds a test to ensure both arguments are positive before making the aforementioned comparison. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index f32ecd6..0c95035 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -392,7 +392,7 @@ NEON_VOP(abd_u32, neon_u32, 1) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8 || tmp = -sizeof(src1) * 8) { \ +if (abs(tmp) = sizeof(src1) * 8) { \ dest = 0; \ } else if (tmp 0) { \ dest = src1 -tmp; \ @@ -420,7 +420,7 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8) { \ +if (tmp = 0 tmp = sizeof(src1) * 8) { \ dest = 0; \ } else if (tmp = -sizeof(src1) * 8) { \ dest = src1 (sizeof(src1) * 8 - 1); \ @@ -453,7 +453,7 @@ uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8) { \ +if (tmp = 0 tmp = sizeof(src1) * 8) { \ dest = 0; \ } else if (tmp -sizeof(src1) * 8) { \ dest = src1 (sizeof(src1) * 8 - 1); \ @@ -494,7 +494,7 @@ uint64_t HELPER(neon_rshl_s64)(uint64_t valop, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8 || tmp -sizeof(src1) * 8) { \ +if (abs(tmp) = sizeof(src1) * 8) { \ dest = 0; \ } else if (tmp == -sizeof(src1) * 8) { \ dest = src1 (tmp - 1); \ @@ -528,7 +528,7 @@ uint64_t HELPER(neon_rshl_u64)(uint64_t val, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8) { \ +if (tmp = 0 tmp = sizeof(src1) * 8) { \ if (src1) { \ SET_QC(); \ dest = ~0; \ @@ -579,7 +579,7 @@ uint64_t HELPER(neon_qshl_u64)(CPUState *env, uint64_t val, uint64_t shiftop) #define NEON_FN(dest, src1, src2) do { \ int8_t tmp; \ tmp = (int8_t)src2; \ -if (tmp = sizeof(src1) * 8) { \ +if (tmp = 0 tmp = sizeof(src1) * 8) { \ if (src1) \ SET_QC(); \ dest = src1 31; \ neon_helper.c.diff Description: neon_helper.c.diff
Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup
On Oct 19, 2009, at 16:23, ext Aurelien Jarno wrote: I think I have a couple of other fixes and patches on top of that as well, but I'd rather wait until you get this bunch committed and then format the patches against the new mainline so that they apply. Thanks I have seen your patch, I'll have a closer look later today. Ok, that was just the resource leak patch. I have some others on top of that, I'll post them as soon as I get them (re)formatted. Perhaps this would also be a good place and time to also ask whether you would be interested in integrating support for OMAP3 in the QEMU mainline? We have been developing and using it for several months now, it's based on the work done by Yajin ya...@vm-kernel.org to support the OMAP3 BeagleBoard hardware. We have added support for the Nokia N900 system emulation as well. In my personal opinion the OMAP3 emulation is in functionality on par with the existing OMAP2 emulation, if not even more complete. That would be very nice to have such an emulated board in mainstream QEMU. I would be happy to review your patches. Alright I'll look into generating a patch set for that as well but it might take a bit longer time to do it since it is a fairly large chunk of code with several modifications to the existing OMAP1/2 (and some others) code as well to accommodate for the OMAP3 features. The OMAP3 support also calls for some further changes in the ARM core emulation to make the guest Linux kernel happy. Regards, Juha
Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup
Apart from the points you have raised about specific patches there were few more minor bugs in the series spotted by Juha Riihimäki juha.riihim...@nokia.com. A fix is available at http://repo.or.cz/w/qemu/navara.git?a=commit;h=2ce696baa6fc5d99522cf387b6a4913807fd43ed One of the fix was already in my branch (catched by Laurent Desnogues). I have added the other fixes in my branch. The last to hunks are good example why temp new/free should be done explicitly ;) I think I have a couple of other fixes and patches on top of that as well, but I'd rather wait until you get this bunch committed and then format the patches against the new mainline so that they apply. On the subject of the new_tmp/dead_tmp, can you elaborate how critical it is if there are resource leaks in the translator code, i.e. if some of the temporary variables don't get marked dead? I suppose the leakage would only affect the translation of the code block where it appears? I found more leaks by introducing a new_tmp64/dead_tmp64 and some other checkups to catch programming errors. Some of the generated tcg code is not very optimal, for example a single vld4.8 instruction can generate over 250 tcg ops. I did some optimizations and got it under 200 but do you think it could be an issue that a single instruction can expand to so many tcg ops? I mean besides the fact that it causes TBs for only one or two guest instructions to be generated. Perhaps this would also be a good place and time to also ask whether you would be interested in integrating support for OMAP3 in the QEMU mainline? We have been developing and using it for several months now, it's based on the work done by Yajin ya...@vm-kernel.org to support the OMAP3 BeagleBoard hardware. We have added support for the Nokia N900 system emulation as well. In my personal opinion the OMAP3 emulation is in functionality on par with the existing OMAP2 emulation, if not even more complete. Cheers, Juha
[Qemu-devel] [PATCH] target-arm: cleanup internal resource leaks
Current ARM translator code has several places where it leaves temporary TCG variables alive. This patch removes all such instances I have found so far. Sorry for the mangled inlined patch, the mailserver I use doesn't like patches so I've also included it as an attachment that hopefully status correctly formatted. Should apply cleanly against latest git. Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com --- diff --git a/target-arm/translate.c b/target-arm/translate.c index e56082b..bc51bcb 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -287,6 +287,7 @@ static TCGv_i64 gen_mulu_i64_i32(TCGv a, TCGv b) tcg_gen_extu_i32_i64(tmp2, b); dead_tmp(b); tcg_gen_mul_i64(tmp1, tmp1, tmp2); +tcg_temp_free_i64(tmp2); return tmp1; } @@ -300,6 +301,7 @@ static TCGv_i64 gen_muls_i64_i32(TCGv a, TCGv b) tcg_gen_ext_i32_i64(tmp2, b); dead_tmp(b); tcg_gen_mul_i64(tmp1, tmp1, tmp2); +tcg_temp_free_i64(tmp2); return tmp1; } @@ -312,9 +314,11 @@ static void gen_mull(TCGv a, TCGv b) tcg_gen_extu_i32_i64(tmp1, a); tcg_gen_extu_i32_i64(tmp2, b); tcg_gen_mul_i64(tmp1, tmp1, tmp2); +tcg_temp_free_i64(tmp2); tcg_gen_trunc_i64_i32(a, tmp1); tcg_gen_shri_i64(tmp1, tmp1, 32); tcg_gen_trunc_i64_i32(b, tmp1); +tcg_temp_free_i64(tmp1); } /* Signed 32x32-64 multiply. */ @@ -326,9 +330,11 @@ static void gen_imull(TCGv a, TCGv b) tcg_gen_ext_i32_i64(tmp1, a); tcg_gen_ext_i32_i64(tmp2, b); tcg_gen_mul_i64(tmp1, tmp1, tmp2); +tcg_temp_free_i64(tmp2); tcg_gen_trunc_i64_i32(a, tmp1); tcg_gen_shri_i64(tmp1, tmp1, 32); tcg_gen_trunc_i64_i32(b, tmp1); +tcg_temp_free_i64(tmp1); } /* Swap low and high halfwords. */ @@ -542,11 +548,13 @@ static void gen_arm_parallel_addsub(int op1, int op2, TCGv a, TCGv b) tmp = tcg_temp_new_ptr(); tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE)); PAS_OP(s) +tcg_temp_free_ptr(tmp); break; case 5: tmp = tcg_temp_new_ptr(); tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE)); PAS_OP(u) +tcg_temp_free_ptr(tmp); break; #undef gen_pas_helper #define gen_pas_helper(name) glue(gen_helper_,name)(a, a, b) @@ -587,11 +595,13 @@ static void gen_thumb2_parallel_addsub(int op1, int op2, TCGv a, TCGv b) tmp = tcg_temp_new_ptr(); tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE)); PAS_OP(s) +tcg_temp_free_ptr(tmp); break; case 4: tmp = tcg_temp_new_ptr(); tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE)); PAS_OP(u) +tcg_temp_free_ptr(tmp); break; #undef gen_pas_helper #define gen_pas_helper(name) glue(gen_helper_,name)(a, a, b) @@ -995,10 +1005,12 @@ static inline void gen_vfp_tosiz(int dp) #define VFP_GEN_FIX(name) \ static inline void gen_vfp_##name(int dp, int shift) \ { \ +TCGv tmp_shift = tcg_const_i32(shift); \ if (dp) \ -gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tcg_const_i32 (shift), cpu_env);\ +gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tmp_shift, cpu_env);\ else \ -gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tcg_const_i32 (shift), cpu_env);\ +gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tmp_shift, cpu_env);\ +tcg_temp_free_i32(tmp_shift); \ } VFP_GEN_FIX(tosh) VFP_GEN_FIX(tosl) @@ -2399,7 +2411,7 @@ static int disas_dsp_insn(CPUState *env, DisasContext *s, uint32_t insn) instruction is not defined. */ static int disas_cp_insn(CPUState *env, DisasContext *s, uint32_t insn) { -TCGv tmp; +TCGv tmp, tmp2; uint32_t rd = (insn 12) 0xf; uint32_t cp = (insn 8) 0xf; if (IS_USER(s)) { @@ -2411,14 +2423,18 @@ static int disas_cp_insn(CPUState *env, DisasContext *s, uint32_t insn) return 1; gen_set_pc_im(s-pc); tmp = new_tmp(); -gen_helper_get_cp(tmp, cpu_env, tcg_const_i32(insn)); +tmp2 = tcg_const_i32(insn); +gen_helper_get_cp(tmp, cpu_env, tmp2); +tcg_temp_free(tmp2); store_reg(s, rd, tmp); } else { if (!env-cp[cp].cp_write) return 1; gen_set_pc_im(s-pc); tmp = load_reg(s, rd); -gen_helper_set_cp(cpu_env, tcg_const_i32(insn), tmp); +tmp2 = tcg_const_i32(insn); +gen_helper_set_cp(cpu_env, tmp2, tmp); +tcg_temp_free(tmp2); dead_tmp(tmp); } return 0; @@ -2449,7 +2465,7 @@ static int cp15_user_ok(uint32_t insn) static int disas_cp15_insn(CPUState *env, DisasContext *s, uint32_t insn) { uint32_t rd; -TCGv tmp; +TCGv tmp, tmp2; /* M profile cores use memory mapped registers instead of cp15. */ if (arm_feature(env, ARM_FEATURE_M)) @@ -2478,9 +2494,10 @@ static int disas_cp15_insn(CPUState *env, DisasContext *s, uint32_t