Re: [Qemu-devel] hw/nand hw/onenand and read-only

2011-10-19 Thread Juha.Riihimaki
On 18.10.11 16:38 , ext Markus Armbruster arm...@redhat.com wrote:

$ qemu-system-arm -drive if=none,file=tmp.qcow2,readonly,id=foo
-device nand,drive=foo -kernel /dev/null
qemu: hardware error: nand_device_init: Unsupported NAND block size.
[...]

Note that I didn't bother supplying a drive with sensible size.  If I
did, I guess I'd get nand coupled to a read-only drive.  Easy enough for
you to try :)

Indeed, thanks ;) While I'm at it, I'll also fix the NAND init function to
return -1 instead of aborting. I'll send patches for hw/nand and
hw/onenand shortly. I'll simply make them reject read-only drives however
both device models already support running without a drive as well by
using a memory buffer instead so it would also be possible to make them
use a read-only drive in a way that initial NAND/OneNAND contents would be
read from the drive but any changes would not be written back to the drive
and would be lost when QEMU is killed.


Cheers,
Juha




Re: [Qemu-devel] hw/nand hw/onenand and read-only

2011-10-18 Thread Juha.Riihimaki
On 18.10.11 11:23 , ext Markus Armbruster arm...@redhat.com wrote:

These guys have been converted to qdev relatively recently.

I wonder what happens when they use a drive defined with -drive
if=none,readonly.

If that's not a valid configuration, we better reject read-only drives,
like ide_init_drive() does.

I'm not an expert with QEMU command line options; how do you pass such a
drive to a NAND device? With if=mtd I get the following which I guess is
expected result:

$ qemu-system-arm -M beagle -drive if=mtd,file=nand.img,readonly
qemu: readonly flag not supported for drive with this interface


Regards,
Juha




Re: [Qemu-devel] [PATCH 0/3] usb-musb: make qdev-aware

2011-09-02 Thread Juha.Riihimaki
How to you test musb?

Unfortunately I don't have any test cases which actively use the musb,
so I settle for testing an n810 image (and a beagle image in my omap3
tree) and confirming that the init part of things still works ok.
(I'm not entirely happy with this but init is really all we're changing
with these patches so we should be ok...)

Riku/Juha -- do you have any musb test images/command lines?

With an existing n810 image I guess you can test USB networking by adding
-usb -net user,vlan=0 -net nic,model=usb,vlan=0 -usbdevice net -redir
tcp:2022::22 to qemu command line parameters. In the guest, launch X
terminal and command sudo gainroot followed by udhcpc. You should now
be able to ssh to the guest from the host with ssh -p 2022
root@localhost. Would this be sufficient for your needs?


Cheers,
Juha




Re: [Qemu-devel] OSX build issues

2011-03-17 Thread Juha.Riihimaki

On 17.03.11 13:09 , ext Tristan Gingold ging...@adacore.com wrote:

On Mar 17, 2011, at 11:28 AM, François Revol wrote:

From the content of the functions called it's either one of the added
fds which cause problem on select() (but why ?), or likely some signal
masks which interfere with some internal thread in the process.
 
 I sampled the process and took some screenshots without and with the
#if 0 hack:
 http://revolf.free.fr/osx/shots/shot_qemu_init_main_loop_if0.png
 Things work but oddly what is supposed to be an internal dispatcher
thread ends up executing qemu code. The main thread includes lots of
calls from arbitrary addresses indicating it receives some signals.
 
 http://revolf.free.fr/osx/shots/shot_qemu_init_main_loop_if1.png
 Things stale, and there are 2 more threads that wait, and the main
thread seems quite stuck in select().

Yes, I have to investigate that...

From what I've found out so far is that the sigwait_compat() function in
compatfd.c will block all signals with sigprocmask() call and then wait
for a single signal in sigwaitinfo() (forgot what it is). Sounds a bit odd
to me but perhaps there's a reason for that? I tried changing the
sigprocmask() call so that it only blocks the same signals that it will
later wait for and things start rolling again.


Regards,
Juha




[Qemu-devel] Re: OMAP3 bug: disassembler disagreed with translator

2011-02-18 Thread Juha.Riihimaki

On Feb 18, 2011, at 14:08, ext Антон Кочков wrote:

 Here also modified omap3_boot.c file. I'm dont know why it is
 successfully load bootrom at 0x40014000, but cant do this for 0x14000
 - it always fill by zeroes.
 I'm only need copy memory from already loaded rom image from
 0x40014000 to 0x00014000 and nothing more.

Our OMAP3 boot emulation is a very simplified model. For example, the boot ROM 
is never mapped to Q0 region, only Q1 region -- the boot ROM execution is 
started at 0x40014000 rather than 0x00014000. There is no memory mapped at Q0 
unless you attach something to the GPMC before reset.


Regards,
Juha

Re: Về: [Qemu-devel] How to fix NEON of cor text-a9 on qemu

2010-09-22 Thread Juha.Riihimaki

On Sep 22, 2010, at 19:56, ext Pham Van Thiet wrote:

 I try to find them on meego.gitorious.org/qemu-maemo but I didn't find them. 
 Can 
 you send the links to me?  

Just clone the git repository and build qemu. Since it is more a development 
branch than a staging branch against upstream all relevant changes are not 
ordered in clean, separate patches that you could apply over qemu upstream.


Regards,
Juha

Re: [Qemu-devel] How to fix NEON of cortext-a9 on qemu

2010-09-21 Thread Juha.Riihimaki

On Sep 21, 2010, at 19:22, ext Pham Van Thiet wrote:

 example for it please send it for me. Or if you have a patch file for NEON 
 instruction on qemu please send it for me?

You could have a try with the MeeGo QEMU branch 
(meego.gitorious.org/qemu-maemo). We have a bunch of ARM patches on top of the 
upstream version, some of which are NEON related.


Regards,
Juha


Re: [Qemu-devel] Can not find the function body of omap3_lcd_panel_draw_fn_32

2010-04-23 Thread Juha.Riihimaki

On Apr 23, 2010, at 15:17, ext steven_c...@htc.commailto:steven_c...@htc.com 
wrote:

I am tracing the code and I can not find the body of omap3_lcd_panel_draw_fn_32.
I did print out the address and search the map file, that address did not match 
any function.
Can some one give me a hint??

That is because it is not a function but an array of function pointers. The 
array is defined in omap3_lcd_panel_template.h but you will not find it in 
mainline QEMU which I guess you are not using anyway since you are looking for 
this symbol.


Regards,
Juha



Re: [Qemu-devel] QEMU state of ARM NEON support.

2010-03-25 Thread Juha.Riihimaki

On Mar 25, 2010, at 12:27, ext Dmitry Zhurikhin wrote:

 juha.riihim...@nokia.com wrote:
 - vqdmlsl yields different results for some specific input values, however I 
 think QEMU is producing correct results. For example, 
 1431655765-(2*-32768*-32768)=-715827883 but your reference hardware results 
 say it should be -715827882.
 From what I understand hardware gets such result because saturation occurs on 
 the (2*-32768*-32768) step.  The result is not 0x8000 as expected but 
 because of overflow is set as the largest 32-bit signed integer 0x7FFF.  
 It 
 seems that QEMU just doesn't perform saturation (in this particular case at 
 least).

Yes I see, I overlooked the fact that -32768*-32768 changes the sign and so 
doubling the result of that overflows the signed 32-bit integer even if the 
result still fits in 32 bits. Thanks, I'll fix that as well.


Regards,
Juha



Re: [Qemu-devel] QEMU support for ARM security features / TrustZone

2010-01-24 Thread Juha.Riihimaki

On Jan 22, 2010, at 14:01, ext leic...@informatik.uni-frankfurt.de wrote:

 I found that QEMU emulates ARM Cortex-8 which in hardware have Trustzone
 support. It is unclear to me if this is the case in the version that QEMU
 emulates.

ARM TrustZone features are currently not emulated.


Regards,
Juha




Re: [Qemu-devel] [PATCH 09/11] Cocoa: Shutdown when window is closed

2009-12-13 Thread Juha.Riihimaki

On Dec 13, 2009, at 04:55, ext Andreas Färber wrote:

 The application is not very useful once the guest window is closed.
 QEMU is not a document-based application; terminating it automatically
 saves the user another action and resembles SDL behavior.
 
 Signed-off-by: Andreas Färber andreas.faer...@web.de
 Cc: Mike Kronenberg mike.kronenb...@kronenberg.org
 Cc: Alexander Graf a...@csgraf.de
 ---
 cocoa.m |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)
 
 diff --git a/cocoa.m b/cocoa.m
 index 70c249b..09ed3cd 100644
 --- a/cocoa.m
 +++ b/cocoa.m
 @@ -712,6 +712,7 @@ static int cocoa_keycode_to_qemu(int keycode)
 - (void)toggleFullScreen:(id)sender;
 - (void)showQEMUDoc:(id)sender;
 - (void)showQEMUTec:(id)sender;
 +- (void)windowWillClose:(NSNotification *)notification;
 @end
 
 @implementation QemuCocoaAppController
 @@ -737,6 +738,7 @@ static int cocoa_keycode_to_qemu(int keycode)
 fprintf(stderr, (cocoa) can't create window\n);
 exit(1);
 }
 +[normalWindow setDelegate:self];
 [normalWindow setAcceptsMouseMovedEvents:YES];
 [normalWindow setTitle:[NSString stringWithFormat:@QEMU]];
 [normalWindow setContentView:cocoaView];
 @@ -835,6 +837,11 @@ static int cocoa_keycode_to_qemu(int keycode)
 [[NSWorkspace sharedWorkspace] openFile:[NSString 
 stringWithFormat:@%@/../doc/qemu/qemu-tech.html,
 [[NSBundle mainBundle] resourcePath]] withApplication:@Help Viewer];
 }
 +
 +- (void)windowWillClose:(NSNotification *)notification
 +{
 + [NSApp terminate:self];
 +}
 @end
 

Another way to achieve the same thing is to instead of the above changes just 
introduce a new method for the QemuCocoaAppController like:

- (BOOL)applicationShouldTerminateAfterLastWindowClosed:(NSApplication 
*)theApplication
{
return YES;
}


Regards,
Juha





Re: [Qemu-devel] [PATCH] target-arm: tcg temp variable usage cleanup

2009-11-02 Thread Juha.Riihimaki
On Nov 1, 2009, at 02:08, ext Laurent Desnogues wrote:

 On Thu, Oct 29, 2009 at 3:01 PM,  juha.riihim...@nokia.com wrote:
 From: Juha Riihimäki juha.riihim...@nokia.com

 TCG temporary variable handling in target-arm/translate.c is  
 currently
 somewhat inconsistent; some functions allocate new temporaries that  
 the
 calling function is expected to free and some other functions free
 temporaries that are passed in as parameters. This patch will  
 remove all
 such instances in the code and make the lifespan of the temporaries  
 more
 clearly visible as they are always allocated and freed within one  
 function.
 The only exception to this are the global temporaries allocated in  
 the
 beginning of the gen_intermediate_code_internal function.
[...]
 I tested your patch by running translate + TCG code gen
 for all of the opcodes in the range e000-.
 For the NEON instructions I had to add correct undefined
 detection to let my program process the range (OTOH I
 didn't bother fixing the wrong decoding and/or codegen,
 I was just doing sanity check on your patch).

 Next step is to also do that for Thumb2.  And then run
 some real programs.

Thanks for your work so far. I fixed the things you pointed out, but  
I'll hold submitting a new version of the patch until you have had  
time to do more testing. I tested with the n810 system emulation, it  
was working fine.


Regards,
Juha



Re: [Qemu-devel] [PATCH 00/19 v2] Add virtio-net/tap support for partial csums and GSO

2009-10-30 Thread Juha.Riihimaki
On Oct 22, 2009, at 19:43, ext Mark McLoughlin wrote:

 Hey,
Over a year ago we added some code to qemu-kvm.git which takes
 advantage of the recent tun/tap IFF_VNET_HDR feature in order to allow
 virtio-net to send and receive packets with partial checksums and
 segmentation offloaded:

  http://article.gmane.org/gmane.comp.emulators.kvm.devel/20440

  This allows us to pass larger packets and packets with
  partial checkums between the guest and the host, greatly
  increasing the achievable bandwidth.

Unfortunately, that implementation was quite hacky as it
 made some assumptions that would break if e.g. you added another
 network client to a vlan where the feature had enabled.

Now that we have the -netdev parameter, we can more safely
 pair the NIC and backend, allowing us to negatiate features like
 this.

What follows is a somewhat cleaned up version of the code
 from qemu-kvm.git. Further cleanups are probably possible, but I
 think this much is mergeable. Some points of discussion:

I think this patch set has broken the QEMU build on OS X. The reason  
is that there are some definitions in tap-linux.h which only gets  
included in net.c if __linux__ is defined but the other parts of net.c  
which utilize definitions from tap-linux.h get compiled in  
nevertheless. It compiles if all definitions (except for the extra  
includes) from tap-linux.h are included for OS X as well. I've  
attached below a sample of the compiler output.


Regards,
Juha

---
[...]
   CCnet.o
net.c: In function ‘tap_receive_iov’:
net.c:1391: error: variable ‘hdr’ has initializer but incomplete type
net.c:1391: warning: excess elements in struct initializer
net.c:1391: warning: (near initialization for ‘hdr’)
net.c:1391: error: storage size of ‘hdr’ isn’t known
net.c:1391: warning: unused variable ‘hdr’
net.c: In function ‘tap_receive_raw’:
net.c:1409: error: variable ‘hdr’ has initializer but incomplete type
net.c:1409: warning: excess elements in struct initializer
net.c:1409: warning: (near initialization for ‘hdr’)
net.c:1409: error: storage size of ‘hdr’ isn’t known
net.c:1409: warning: unused variable ‘hdr’
net.c: In function ‘tap_send’:
net.c:1484: error: invalid application of ‘sizeof’ to incomplete type  
‘struct virtio_net_hdr’
net.c:1485: error: invalid application of ‘sizeof’ to incomplete type  
‘struct virtio_net_hdr’
net.c: In function ‘tap_set_sndbuf’:
net.c:1511: error: ‘TUNSETSNDBUF’ undeclared (first use in this  
function)
net.c:1511: error: (Each undeclared identifier is reported only once
net.c:1511: error: for each function it appears in.)
net.c: In function ‘tap_probe_vnet_hdr’:
net.c:1552: error: ‘TUNGETIFF’ undeclared (first use in this function)
net.c:1557: error: ‘IFF_VNET_HDR’ undeclared (first use in this  
function)
net.c: In function ‘tap_set_offload’:
net.c:1567: error: ‘TUN_F_CSUM’ undeclared (first use in this function)
net.c:1569: error: ‘TUN_F_TSO4’ undeclared (first use in this function)
net.c:1571: error: ‘TUN_F_TSO6’ undeclared (first use in this function)
net.c:1573: error: ‘TUN_F_TSO_ECN’ undeclared (first use in this  
function)
net.c:1575: error: ‘TUN_F_UFO’ undeclared (first use in this function)
net.c:1578: error: ‘TUNSETOFFLOAD’ undeclared (first use in this  
function)
net.c: In function ‘net_tap_fd_init’:
net.c:1623: error: ‘TUN_F_CSUM’ undeclared (first use in this function)
net.c:1623: error: ‘TUN_F_UFO’ undeclared (first use in this function)
net.c:1624: error: ‘TUNSETOFFLOAD’ undeclared (first use in this  
function)
make[1]: *** [net.o] Error 1
make: *** [build-all] Error 2





Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops

2009-10-29 Thread Juha.Riihimaki
On Oct 26, 2009, at 23:05, ext Aurelien Jarno wrote:

 On Mon, Oct 26, 2009 at 10:11:07AM +0100, Laurent Desnogues wrote:
 On Mon, Oct 26, 2009 at 8:46 AM,  juha.riihim...@nokia.com wrote:

 On Oct 25, 2009, at 16:01, ext Laurent Desnogues wrote:

 I don't really like the idea of having tcg_qemu_ld/st not factored
 in some place, as it makes memory access tracing extensions
 more intrusive.

 This brings us back to the problem having function freeing tmps.
 In that case, you could perhaps create a gen_st16_dont_free
 function as a temporary workaround?

 I could, but it is getting ugly =/ How about if I create another  
 patch
 that moves the temporary variable (de)allocation outside the  
 gen_ldx/
 gen_stx functions and then refactor this patch on top of that?

 I'd personally like this, but I guess you'd better wait for some  
 inputs
 from other QEMU dev's before doing it.

 Looks like the best option to me.

Alrighty then, I did the patch against the latest git and it's rather  
large... but seems to have broken nothing at least in my testing. The  
patch will remove all implicit tcg temp variable allocation and  
deallocation in target-arm/translate.c and make it the responsibility  
of the calling function. At the same time I also removed the new_tmp  
and dead_tmp functions completely because I see no point in only  
tracking some of the 32bit temp variables instead of everything.  
Personally I think the patch makes reading and understanding (and why  
not also writing) the file much easier. I do also have a version that  
has a compile time option of substituting the tcg temp variable alloc/ 
dealloc function calls with inline functions that track the usage but  
this is not included with the patch.

I'll send the patch shortly, should be nice bed-time reading I  
guess ;) Please comment when you have free time to read it through...


Cheers,
Juha




Re: [Qemu-devel] [PATCH v2 05/10] target-arm: optimize arm load/store multiple ops

2009-10-27 Thread Juha.Riihimaki
On Oct 27, 2009, at 11:00, ext Aurelien Jarno wrote:

 juha.riihim...@nokia.com a écrit :
 On Oct 27, 2009, at 10:39, ext Aurelien Jarno wrote:

 On Sat, Oct 24, 2009 at 03:19:04PM +0300, juha.riihim...@nokia.com
 wrote:
 From: Juha Riihimäki juha.riihim...@nokia.com

 RM load/store multiple instructions can be slightly optimized by
 loading the register offset constant into a variable outside the
 register loop and using the preloaded variable inside the loop
 instead
 of reloading the offset value to a temporary variable on each loop
 iteration. This causes less TCG ops to be generated for a ARM load/
 store multiple instruction if there are more than one register
 accessed, otherwise the number of generated TCG ops is the same.

 Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
 Acked-by: Laurent Desnogues laurent.desnog...@gmail.com
 This patch breaks, the boot of an arm kernel, as tmp2 is used
 elsewhere
 within this code path.

 True, I just noticed that as well. This is because the resource leak
 patch
 was refactored to utilize tmp2 inside the loop as well. I just sent a
 new
 revision of this patch that uses tmp3 for th constant value.

 OTOH, while it reduce the number of TCG ops, that should not impact
 the
 generated host asm code, as most (all ?) targets are able to add a
 small constant value to a register in one instruction.

 This is true, but I still think it provides a small speed gain as
 there are
 less TCG ops to be processed when generating host code...?

 It means less TCG ops, but one more temp variable, therefore if  
 there is
 a gain, I don't think it is something even measurable.

 OTOH it makes the code a bit more complex to read. I am not really
 opposed to this patch (and the other patches of the same kind), but I
 will need some more arguments to convince me.

Shouldn't the amount of temp variables stay the same? tcg_gen_addi_i32  
will internally allocate a temporary variable to hold the integer  
parameter. The only difference is that the temporary stays alive  
during the loop instead of being allocated/deallocated during each  
iteration. But I agree, the performance gain is probably quite small.


Regards,
Juha



Re: [Qemu-devel] [PATCH v2 09/10] target-arm: optimize neon vld/vst ops

2009-10-26 Thread Juha.Riihimaki

On Oct 25, 2009, at 16:01, ext Laurent Desnogues wrote:

 I don't really like the idea of having tcg_qemu_ld/st not factored
 in some place, as it makes memory access tracing extensions
 more intrusive.

 This brings us back to the problem having function freeing tmps.
 In that case, you could perhaps create a gen_st16_dont_free
 function as a temporary workaround?

I could, but it is getting ugly =/ How about if I create another patch  
that moves the temporary variable (de)allocation outside the gen_ldx/ 
gen_stx functions and then refactor this patch on top of that?


Regards,
Juha




Re: [Qemu-devel] [PATCH v2][RESEND] target-arm: cleanup internal resource leaks

2009-10-26 Thread Juha.Riihimaki

On Oct 26, 2009, at 12:46, ext Laurent Desnogues wrote:

 @@ -5511,8 +5539,9 @@ static int disas_neon_data_insn(CPUState *  
 env, DisasContext *s, uint32_t insn)
tcg_gen_movi_i32(tmp, 0);
}
tmp2 = neon_load_reg(rm, 0);
 -gen_helper_neon_tbl(tmp2, tmp2, tmp, tcg_const_i32 
 (rn),
 -tcg_const_i32(n));
 +tmp4 = tcg_const_i32(rn);
 +tmp5 = tcg_const_i32(n);
 +gen_helper_neon_tbl(tmp2, tmp2, tmp, tmp4, tmp5);
dead_tmp(tmp);
if (insn  (1  6)) {
tmp = neon_load_reg(rd, 1);
 @@ -5521,8 +5550,9 @@ static int disas_neon_data_insn(CPUState *  
 env, DisasContext *s, uint32_t insn)
tcg_gen_movi_i32(tmp, 0);
}
tmp3 = neon_load_reg(rm, 1);
 -gen_helper_neon_tbl(tmp3, tmp3, tmp, tcg_const_i32 
 (rn),
 -tcg_const_i32(n));
 +gen_helper_neon_tbl(tmp3, tmp3, tmp, tmp4, tmp5);
 +dead_tmp(tmp5);
 +dead_tmp(tmp4);

 I missed this mistake when I acked the patch:  you're using
 dead_tmp instead of tcg_temp_free...  One more reason
 to drop dead_tmp :-)

Indeed. I'll fix it, didn't notice it either because I'm using other  
new_xxx/dead_xxx helpers myself to catch resource leaks. I should work  
around that to get more similar code and eliminate this kind of issues.

The bug will cause false resource leak reports, the actual freeing of  
the temporary is done correctly. Since this patch seems to have  
already been applied, I'll just provide a patch that fixes this  
specific error in the mainline instead of reposting a fixed version of  
the original patch.


Regards,
Juha




Re: [Qemu-devel] [PATCH] Add support for multiple simultaneously used keyboard devices.

2009-10-25 Thread Juha.Riihimaki
On Oct 24, 2009, at 18:09, ext Filip Navara wrote:

 The support for multiple keyboard devices is essential for emulating  
 embedded boards where multiple input devices are present (eg. keypad  
 and rotary encoder) which are implemented using separate QEMU devices.

Nice to see someone else needs this as well. I just did a very similar  
implementation a short while ago.


Regards,
Juha




Re: [Qemu-devel] [PATCH 10/12] target-arm: replace tcg_gen_rori_i32 by tcg_gen_rotri_i32

2009-10-24 Thread Juha.Riihimaki

On Oct 23, 2009, at 18:25, ext Aurelien Jarno wrote:

 On Wed, Oct 21, 2009 at 12:18:08PM +0200, juha.riihim...@nokia.com  
 wrote:
 Use native rotation if possible instead of a simulated one.

 I have another patch in my local tree that handle more cases:

Great, I'll drop it from my patch series then.


Cheers,
Juha




[Qemu-devel] [PATCH v2] target-arm: cleanup internal resource leaks

2009-10-22 Thread Juha.Riihimaki
Revised patch for getting rid of tcg temporary variable leaks in
target-arm/translate.c. This version also includes the leak patch for
gen_set_cpsr macro, now converted as a static inline function, which I
sent earlier as a separate patch on top of this patch. This patch (the
attached version) should apply cleanly against the latest mainline git.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index e56082b..813f661 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -184,7 +184,12 @@ static void store_reg(DisasContext *s, int reg,
TCGv var)
  #define gen_uxtb16(var) gen_helper_uxtb16(var, var)


-#define gen_set_cpsr(var, mask) gen_helper_cpsr_write(var,
tcg_const_i32(mask))
+static inline void gen_set_cpsr(TCGv var, uint32_t mask)
+{
+TCGv tmp_mask = tcg_const_i32(mask);
+gen_helper_cpsr_write(var, tmp_mask);
+tcg_temp_free_i32(tmp_mask);
+}
  /* Set NZCV flags from the high 4 bits of var.  */
  #define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV)

@@ -287,6 +292,7 @@ static TCGv_i64 gen_mulu_i64_i32(TCGv a, TCGv b)
  tcg_gen_extu_i32_i64(tmp2, b);
  dead_tmp(b);
  tcg_gen_mul_i64(tmp1, tmp1, tmp2);
+tcg_temp_free_i64(tmp2);
  return tmp1;
  }

@@ -300,6 +306,7 @@ static TCGv_i64 gen_muls_i64_i32(TCGv a, TCGv b)
  tcg_gen_ext_i32_i64(tmp2, b);
  dead_tmp(b);
  tcg_gen_mul_i64(tmp1, tmp1, tmp2);
+tcg_temp_free_i64(tmp2);
  return tmp1;
  }

@@ -312,9 +319,11 @@ static void gen_mull(TCGv a, TCGv b)
  tcg_gen_extu_i32_i64(tmp1, a);
  tcg_gen_extu_i32_i64(tmp2, b);
  tcg_gen_mul_i64(tmp1, tmp1, tmp2);
+tcg_temp_free_i64(tmp2);
  tcg_gen_trunc_i64_i32(a, tmp1);
  tcg_gen_shri_i64(tmp1, tmp1, 32);
  tcg_gen_trunc_i64_i32(b, tmp1);
+tcg_temp_free_i64(tmp1);
  }

  /* Signed 32x32-64 multiply.  */
@@ -326,9 +335,11 @@ static void gen_imull(TCGv a, TCGv b)
  tcg_gen_ext_i32_i64(tmp1, a);
  tcg_gen_ext_i32_i64(tmp2, b);
  tcg_gen_mul_i64(tmp1, tmp1, tmp2);
+tcg_temp_free_i64(tmp2);
  tcg_gen_trunc_i64_i32(a, tmp1);
  tcg_gen_shri_i64(tmp1, tmp1, 32);
  tcg_gen_trunc_i64_i32(b, tmp1);
+tcg_temp_free_i64(tmp1);
  }

  /* Swap low and high halfwords.  */
@@ -542,11 +553,13 @@ static void gen_arm_parallel_addsub(int op1, int
op2, TCGv a, TCGv b)
  tmp = tcg_temp_new_ptr();
  tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE));
  PAS_OP(s)
+tcg_temp_free_ptr(tmp);
  break;
  case 5:
  tmp = tcg_temp_new_ptr();
  tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE));
  PAS_OP(u)
+tcg_temp_free_ptr(tmp);
  break;
  #undef gen_pas_helper
  #define gen_pas_helper(name) glue(gen_helper_,name)(a, a, b)
@@ -587,11 +600,13 @@ static void gen_thumb2_parallel_addsub(int op1,
int op2, TCGv a, TCGv b)
  tmp = tcg_temp_new_ptr();
  tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE));
  PAS_OP(s)
+tcg_temp_free_ptr(tmp);
  break;
  case 4:
  tmp = tcg_temp_new_ptr();
  tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE));
  PAS_OP(u)
+tcg_temp_free_ptr(tmp);
  break;
  #undef gen_pas_helper
  #define gen_pas_helper(name) glue(gen_helper_,name)(a, a, b)
@@ -995,10 +1010,12 @@ static inline void gen_vfp_tosiz(int dp)
  #define VFP_GEN_FIX(name) \
  static inline void gen_vfp_##name(int dp, int shift) \
  { \
+TCGv tmp_shift = tcg_const_i32(shift); \
  if (dp) \
-gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tcg_const_i32
(shift), cpu_env);\
+gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tmp_shift,
cpu_env);\
  else \
-gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tcg_const_i32
(shift), cpu_env);\
+gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tmp_shift,
cpu_env);\
+tcg_temp_free_i32(tmp_shift); \
  }
  VFP_GEN_FIX(tosh)
  VFP_GEN_FIX(tosl)
@@ -2399,7 +2416,7 @@ static int disas_dsp_insn(CPUState *env,
DisasContext *s, uint32_t insn)
 instruction is not defined.  */
  static int disas_cp_insn(CPUState *env, DisasContext *s, uint32_t
insn)
  {
-TCGv tmp;
+TCGv tmp, tmp2;
  uint32_t rd = (insn  12)  0xf;
  uint32_t cp = (insn  8)  0xf;
  if (IS_USER(s)) {
@@ -2411,14 +2428,18 @@ static int disas_cp_insn(CPUState *env,
DisasContext *s, uint32_t insn)
  return 1;
  gen_set_pc_im(s-pc);
  tmp = new_tmp();
-gen_helper_get_cp(tmp, cpu_env, tcg_const_i32(insn));
+tmp2 = tcg_const_i32(insn);
+gen_helper_get_cp(tmp, cpu_env, tmp2);
+tcg_temp_free(tmp2);
  store_reg(s, rd, tmp);
  } else {
  if (!env-cp[cp].cp_write)
  return 1;
  gen_set_pc_im(s-pc);
  tmp = load_reg(s, rd);
-gen_helper_set_cp(cpu_env, tcg_const_i32(insn), tmp);
+tmp2 = tcg_const_i32(insn);
+

Re: [Qemu-devel] [PATCH 07/12] target-arm: fix neon vsri, vshl and vsli ops

2009-10-22 Thread Juha.Riihimaki

On Oct 21, 2009, at 13:46, ext Laurent Desnogues wrote:

 @@ -4624,31 +4624,31 @@ static int disas_neon_data_insn(CPUState *
 env, DisasContext *s, uint32_t insn)
  switch (size) {
  case 0:
  if (op == 4)
 -imm = 0xff  -shift;
 +imm2 = 0xff  -shift;
  else
 -imm = (uint8_t)(0xff  shift);
 -imm |= imm  8;
 -imm |= imm  16;
 +imm2 = (uint8_t)(0xff  shift);
 +imm2 |= imm2  8;
 +imm2 |= imm2  16;
  break;
  case 1:
  if (op == 4)
 -imm = 0x  -shift;
 +imm2 = 0x  -shift;
  else
 -imm = (uint16_t)(0x   
 shift);
 -imm |= imm  16;
 +imm2 = (uint16_t)(0x   
 shift);
 +imm2 |= imm2  16;
  break;
  case 2:
  if (op == 4)
 -imm = 0xu  -shift;
 +imm2 = 0xu  -shift;
  else
 -imm = 0xu  shift;
 +imm2 = 0xu  shift;
  break;
  default:
  abort();
  }
  tmp2 = neon_load_reg(rd, pass);
 -tcg_gen_andi_i32(tmp, tmp, imm);
 -tcg_gen_andi_i32(tmp2, tmp2, ~imm);
 +tcg_gen_andi_i32(tmp, tmp, imm2);
 +tcg_gen_andi_i32(tmp2, tmp2, ~imm2);
  tcg_gen_or_i32(tmp, tmp, tmp2);
  dead_tmp(tmp2);
  }

 I mostly agree with this, except there's a mistake already
 present in the original code:  for a size of 2 the shift amount
 can be 32 (look at VSRI in the ARM Architecture Reference
 Manual).  Note in this case, shift will be -32.

True. However, I'm not sure if this causes incorrect behavior or not.  
I'm not a NEON expert but how I understand the VSRI instruction is  
that it will shift the element value before it is inserted in the  
destination vector, therefore with element size 2 and shift 32 the  
result would be always zero and I guess that would still apply if the  
shift was -32: does it matter in which direction you shift if you  
anyway shift all the bits out?


Regards,
Juha





Re: [Qemu-devel] [PATCH] target-arm: cleanup internal resource leaks

2009-10-22 Thread Juha.Riihimaki

On Oct 22, 2009, at 08:40, Riihimaki Juha (Nokia-D/Helsinki) wrote:

 @@ -4676,12 +4694,16 @@ static int disas_neon_data_insn(CPUState *
 env, DisasContext *s, uint32_t insn)
gen_neon_narrow_satu(size - 1, tmp,
 cpu_V0);
}
if (pass == 0) {
 +dead_tmp(tmp2);

 This looks wrong if size == 3 since we have TCGV_UNUSED(tmp2).

 You're right. However, looking at the surrounding code a bit closer I
 began to wonder if it works correctly at all since tmp2 is used as a
 shift value if size  2 but it is also used to store the result of the
 first pass. Therefore the result of the first pass is used as a shift
 value during second pass... perhaps not correct? Wouldn't it make more
 sense to store the result of the first pass in the lower half of the
 destination neon register directly during the first pass? I see no
 point in keeping it in a temporary variable and only store both halves
 of the destination neon register during the second pass. Especially
 since there is no memory access involved, everything is done in
 registers. What do you think?

To make it more clear what I'm after, look at the patch below that  
changes the code into what I think is correct functionality. The patch  
applies on top of the v2 resource leak patch which I just sent a short  
while ago.


Cheers,
Juha

---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 813f661..7598246 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4696,18 +4696,12 @@ static int disas_neon_data_insn(CPUState *  
env, DisasContext *s, uint32_t insn)
  else
  gen_neon_narrow_satu(size - 1, tmp,  
cpu_V0);
  }
-if (pass == 0) {
-if (size != 3) {
-dead_tmp(tmp2);
-}
-tmp2 = tmp;
-} else {
-neon_store_reg(rd, 0, tmp2);
-neon_store_reg(rd, 1, tmp);
-}
+neon_store_reg(rd, pass, tmp);
  } /* for pass */
  if (size == 3) {
  tcg_temp_free_i64(tmp64);
+} else {
+dead_tmp(tmp2);
  }
  } else if (op == 10) {
  /* VSHLL */





Re: [Qemu-devel] [PATCH 07/12] target-arm: fix neon vsri, vshl and vsli ops

2009-10-22 Thread Juha.Riihimaki

On Oct 22, 2009, at 10:18, ext Laurent Desnogues wrote:

 On Thu, Oct 22, 2009 at 8:49 AM,  juha.riihim...@nokia.com wrote:

 On Oct 21, 2009, at 13:46, ext Laurent Desnogues wrote:

 @@ -4624,31 +4624,31 @@ static int disas_neon_data_insn(CPUState *
 env, DisasContext *s, uint32_t insn)
  switch (size) {
  case 0:
  if (op == 4)
 -imm = 0xff  -shift;
 +imm2 = 0xff  -shift;
  else
 -imm = (uint8_t)(0xff   
 shift);
 -imm |= imm  8;
 -imm |= imm  16;
 +imm2 = (uint8_t)(0xff   
 shift);
 +imm2 |= imm2  8;
 +imm2 |= imm2  16;
  break;
  case 1:
  if (op == 4)
 -imm = 0x  -shift;
 +imm2 = 0x  -shift;
  else
 -imm = (uint16_t)(0x 
 shift);
 -imm |= imm  16;
 +imm2 = (uint16_t)(0x 
 shift);
 +imm2 |= imm2  16;
  break;
  case 2:
  if (op == 4)
 -imm = 0xu  -shift;
 +imm2 = 0xu  -shift;
  else
 -imm = 0xu  shift;
 +imm2 = 0xu  shift;
  break;
  default:
  abort();
  }
  tmp2 = neon_load_reg(rd, pass);
 -tcg_gen_andi_i32(tmp, tmp, imm);
 -tcg_gen_andi_i32(tmp2, tmp2, ~imm);
 +tcg_gen_andi_i32(tmp, tmp, imm2);
 +tcg_gen_andi_i32(tmp2, tmp2, ~imm2);
  tcg_gen_or_i32(tmp, tmp, tmp2);
  dead_tmp(tmp2);
  }

 I mostly agree with this, except there's a mistake already
 present in the original code:  for a size of 2 the shift amount
 can be 32 (look at VSRI in the ARM Architecture Reference
 Manual).  Note in this case, shift will be -32.

 True. However, I'm not sure if this causes incorrect behavior or not.
 I'm not a NEON expert but how I understand the VSRI instruction is
 that it will shift the element value before it is inserted in the
 destination vector, therefore with element size 2 and shift 32 the
 result would be always zero and I guess that would still apply if the
 shift was -32: does it matter in which direction you shift if you
 anyway shift all the bits out?

 The problem is not in the NEON semantics but rather the C
 one:  the behaviour of a shift by an amount greater than or
 equal to the bit-width of the type is undefined;  in this case
 imm2 is 32-bit and the shift is 32-bit.  What you want is
 imm2 = 0 if shift is -32, as you correctly guessed.

Ah, I see. Doesn't the same issue apply for the 8bit and 16bit element  
sizes as well? I suppose it will be sufficient to check for a negative  
shift value and in that case force the mask value to zero.


Regards,
Juha




Re: [Qemu-devel] [PATCH v2] target-arm: cleanup internal resource leaks

2009-10-22 Thread Juha.Riihimaki

On Oct 22, 2009, at 11:04, ext Gerd Hoffmann wrote:


   Hi,

 -#define gen_set_cpsr(var, mask) gen_helper_cpsr_write(var,
 tcg_const_i32(mask))

 Your mailer mangles the patches.

Yes, I know, I can't help it, I'm sorry. That's why I include the  
patch as an attachment as well -- the attachment should stay in  
working condition I think. I put the patches inline as well to make it  
possible to look at the patches without the need to separately open  
the attachment; whether you need to do that or not depends on the mail  
client, I suppose a client could also show text attachment inline.


Regards,
Juha




Re: [Qemu-devel] [PATCH 12/12] [RESEND] target-arm: fix neon shift helper functions

2009-10-22 Thread Juha.Riihimaki

On Oct 22, 2009, at 10:57, ext Laurent Desnogues wrote:

 On Wed, Oct 21, 2009 at 1:01 PM,  juha.riihim...@nokia.com wrote:
 Current code is broken at least on gcc 4.2, the result of a  
 comparison
 -1 = sizeof(type) * 8 results true and causes wrong code path to  
 be
 taken. The fix utilizes abs() function where applicable and otherwise
 adds a test to ensure both arguments are positive before making the
 aforementioned comparison.

 Why don't you instead cast sizeof to int?

No reason, I'll make the v2 patch series use cast instead. It seems to  
work just as well.


Juha




Re: [Qemu-devel] [PATCH 03/12] target-arm: add support for neon vld1.64 instruction

2009-10-22 Thread Juha.Riihimaki

On Oct 22, 2009, at 11:39, ext Laurent Desnogues wrote:

 On Wed, Oct 21, 2009 at 12:17 PM,  juha.riihim...@nokia.com wrote:
 Add support for neon vld1.64 instruction.

 From: Riku Voipio riku.voi...@iki.fi
 Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
 ---
 diff --git a/target-arm/translate.c b/target-arm/translate.c
 index 3ea9d51..d027572 100644
 --- a/target-arm/translate.c
 +++ b/target-arm/translate.c
 @@ -795,6 +795,12 @@ static inline TCGv gen_ld32(TCGv addr, int  
 index)
  tcg_gen_qemu_ld32u(tmp, addr, index);
  return tmp;
  }
 +static inline TCGv_i64 gen_ld64(TCGv addr, int index)
 +{
 +TCGv_i64 tmp = tcg_temp_new_i64();
 +tcg_gen_qemu_ld64(tmp, addr, index);
 +return tmp;
 +}
  static inline void gen_st8(TCGv val, TCGv addr, int index)
  {
  tcg_gen_qemu_st8(val, addr, index);
 @@ -810,6 +816,11 @@ static inline void gen_st32(TCGv val, TCGv addr,
 int index)
  tcg_gen_qemu_st32(val, addr, index);
  dead_tmp(val);
  }
 +static inline void gen_st64(TCGv_i64 val, TCGv addr, int index)
 +{
 +tcg_gen_qemu_st64(val, addr, index);
 +tcg_temp_free_i64(val);
 +}

  static inline void gen_set_pc_im(uint32_t val)
  {
 @@ -3690,6 +3701,7 @@ static int disas_neon_ls_insn(CPUState * env,
 DisasContext *s, uint32_t insn)
  TCGv addr;
  TCGv tmp;
  TCGv tmp2;
 +TCGv_i64 tmp64;

  if (!vfp_enabled(env))
return 1;
 @@ -3702,7 +3714,7 @@ static int disas_neon_ls_insn(CPUState * env,
 DisasContext *s, uint32_t insn)
  /* Load store all elements.  */
  op = (insn  8)  0xf;
  size = (insn  6)  3;
 -if (op  10 || size == 3)
 +if (op  10)

 This is wrong:  a size of 3 is limited to vld1.64 and vst1.64 which
 you don't enforce here.

 Apart from that, the rest looks OK.

Thanks, and you're right of course. I'll add a check that will return  
1 if size equals 3 and interleave or spacing is not 1.


Cheers,
Juha



Re: [Qemu-devel] [PATCH v2] target-arm: cleanup internal resource leaks

2009-10-22 Thread Juha.Riihimaki
On Oct 22, 2009, at 12:10, ext Gerd Hoffmann wrote:

 On 10/22/09 10:35, juha.riihim...@nokia.com wrote:
   Hi,

 -#define gen_set_cpsr(var, mask) gen_helper_cpsr_write(var,
 tcg_const_i32(mask))

 Your mailer mangles the patches.

 Yes, I know, I can't help it, I'm sorry.

 Makes it very difficult to extract the patches in a automated way ...

Well, I tested that the formatting stays correct if I send the mail  
body HTML format instead of plain text. I suppose you would not like  
to read/parse HTML mails in this list, however, or...?


Regards,
Juha




Re: [Qemu-devel] [PATCH v2] target-arm: cleanup internal resource leaks

2009-10-22 Thread Juha.Riihimaki
On Oct 22, 2009, at 13:08, ext Gerd Hoffmann wrote:

 On 10/22/09 11:55, juha.riihim...@nokia.com wrote:

 Your mailer mangles the patches.

 Yes, I know, I can't help it, I'm sorry.

 Makes it very difficult to extract the patches in a automated  
 way ...

 Well, I tested that the formatting stays correct if I send the mail
 body HTML format instead of plain text. I suppose you would not like
 to read/parse HTML mails in this list, however, or...?

 Something git am can handle would be best.  git send-email would  
 be
 perfect if you can get that work.  'git send-email --smtp-server
 $internalmailserver.nokia.com' could do the trick.  That will also  
 send
 all mails in one thread so it is easy to figure that these mails  
 belong
 together.

Okay, thanks for the suggestion, I'll try that and resend the v2  
patch, let's see what happens.


Cheers,
Juha




[Qemu-devel] [PATCH 00/12] target-arm: miscellaneous fixes

2009-10-21 Thread Juha.Riihimaki
Hi,

This patch series includes a number of smaller fixes and improvements  
to the ARM translator. The series should be applied in sequence as the  
modifications are all but one related to the same file, target-arm/ 
translate.c. The whole series should apply cleanly against latest git  
AFTER applying the resource leak patch which I sent on this list  
couple of days ago.


Cheers,
Juha




[Qemu-devel] [PATCH 03/12] target-arm: add support for neon vld1.64 instruction

2009-10-21 Thread Juha.Riihimaki
Add support for neon vld1.64 instruction.

From: Riku Voipio riku.voi...@iki.fi
Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 3ea9d51..d027572 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -795,6 +795,12 @@ static inline TCGv gen_ld32(TCGv addr, int index)
  tcg_gen_qemu_ld32u(tmp, addr, index);
  return tmp;
  }
+static inline TCGv_i64 gen_ld64(TCGv addr, int index)
+{
+TCGv_i64 tmp = tcg_temp_new_i64();
+tcg_gen_qemu_ld64(tmp, addr, index);
+return tmp;
+}
  static inline void gen_st8(TCGv val, TCGv addr, int index)
  {
  tcg_gen_qemu_st8(val, addr, index);
@@ -810,6 +816,11 @@ static inline void gen_st32(TCGv val, TCGv addr,  
int index)
  tcg_gen_qemu_st32(val, addr, index);
  dead_tmp(val);
  }
+static inline void gen_st64(TCGv_i64 val, TCGv addr, int index)
+{
+tcg_gen_qemu_st64(val, addr, index);
+tcg_temp_free_i64(val);
+}

  static inline void gen_set_pc_im(uint32_t val)
  {
@@ -3690,6 +3701,7 @@ static int disas_neon_ls_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  TCGv addr;
  TCGv tmp;
  TCGv tmp2;
+TCGv_i64 tmp64;

  if (!vfp_enabled(env))
return 1;
@@ -3702,7 +3714,7 @@ static int disas_neon_ls_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  /* Load store all elements.  */
  op = (insn  8)  0xf;
  size = (insn  6)  3;
-if (op  10 || size == 3)
+if (op  10)
  return 1;
  nregs = neon_ls_element_type[op].nregs;
  interleave = neon_ls_element_type[op].interleave;
@@ -3716,61 +3728,74 @@ static int disas_neon_ls_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  load_reg_var(s, addr, rn);
  tcg_gen_addi_i32(addr, addr, 1  size);
  }
-for (pass = 0; pass  2; pass++) {
-if (size == 2) {
-if (load) {
-tmp = gen_ld32(addr, IS_USER(s));
-neon_store_reg(rd, pass, tmp);
-} else {
-tmp = neon_load_reg(rd, pass);
-gen_st32(tmp, addr, IS_USER(s));
-}
-tcg_gen_addi_i32(addr, addr, stride);
-} else if (size == 1) {
-if (load) {
-tmp = gen_ld16u(addr, IS_USER(s));
-tcg_gen_addi_i32(addr, addr, stride);
-tmp2 = gen_ld16u(addr, IS_USER(s));
-tcg_gen_addi_i32(addr, addr, stride);
-gen_bfi(tmp, tmp, tmp2, 16, 0x);
-dead_tmp(tmp2);
-neon_store_reg(rd, pass, tmp);
-} else {
-tmp = neon_load_reg(rd, pass);
-tmp2 = new_tmp();
-tcg_gen_shri_i32(tmp2, tmp, 16);
-gen_st16(tmp, addr, IS_USER(s));
-tcg_gen_addi_i32(addr, addr, stride);
-gen_st16(tmp2, addr, IS_USER(s));
+if (size == 3) {
+if (load) {
+tmp64 = gen_ld64(addr, IS_USER(s));
+neon_store_reg64(tmp64, rd);
+tcg_temp_free_i64(tmp64);
+} else {
+tmp64 = tcg_temp_new_i64();
+neon_load_reg64(tmp64, rd);
+gen_st64(tmp64, addr, IS_USER(s));
+}
+tcg_gen_addi_i32(addr, addr, stride);
+} else {
+for (pass = 0; pass  2; pass++) {
+if (size == 2) {
+if (load) {
+tmp = gen_ld32(addr, IS_USER(s));
+neon_store_reg(rd, pass, tmp);
+} else {
+tmp = neon_load_reg(rd, pass);
+gen_st32(tmp, addr, IS_USER(s));
+}
  tcg_gen_addi_i32(addr, addr, stride);
-}
-} else /* size == 0 */ {
-if (load) {
-TCGV_UNUSED(tmp2);
-for (n = 0; n  4; n++) {
-tmp = gen_ld8u(addr, IS_USER(s));
+} else if (size == 1) {
+if (load) {
+tmp = gen_ld16u(addr, IS_USER(s));
+tcg_gen_addi_i32(addr, addr, stride);
+tmp2 = gen_ld16u(addr, IS_USER(s));
+tcg_gen_addi_i32(addr, addr, stride);
+gen_bfi(tmp, tmp, tmp2, 16, 0x);
+dead_tmp(tmp2);
+neon_store_reg(rd, pass, tmp);
+} else {
+

[Qemu-devel] [PATCH 02/12] target-arm: optimize thumb 32bit multiply

2009-10-21 Thread Juha.Riihimaki
Current implementation of thumb mul instruction is implemented as a  
32x32-64 multiply which then uses only 32 least significant bits of  
the result. Replace that with a simple 32x32-32 multiply.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index bda105e..3ea9d51 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -310,22 +310,6 @@ static TCGv_i64 gen_muls_i64_i32(TCGv a, TCGv b)
  return tmp1;
  }

-/* Unsigned 32x32-64 multiply.  */
-static void gen_mull(TCGv a, TCGv b)
-{
-TCGv_i64 tmp1 = tcg_temp_new_i64();
-TCGv_i64 tmp2 = tcg_temp_new_i64();
-
-tcg_gen_extu_i32_i64(tmp1, a);
-tcg_gen_extu_i32_i64(tmp2, b);
-tcg_gen_mul_i64(tmp1, tmp1, tmp2);
-tcg_temp_free_i64(tmp2);
-tcg_gen_trunc_i64_i32(a, tmp1);
-tcg_gen_shri_i64(tmp1, tmp1, 32);
-tcg_gen_trunc_i64_i32(b, tmp1);
-tcg_temp_free_i64(tmp1);
-}
-
  /* Signed 32x32-64 multiply.  */
  static void gen_imull(TCGv a, TCGv b)
  {
@@ -8358,7 +8342,7 @@ static void disas_thumb_insn(CPUState *env,  
DisasContext *s)
  gen_logic_CC(tmp);
  break;
  case 0xd: /* mul */
-gen_mull(tmp, tmp2);
+tcg_gen_mul_i32(tmp, tmp, tmp2);
  if (!s-condexec_mask)
  gen_logic_CC(tmp);
  break;


translate.c.genmull.diff
Description: translate.c.genmull.diff


[Qemu-devel] [PATCH 07/12] target-arm: fix neon vsri, vshl and vsli ops

2009-10-21 Thread Juha.Riihimaki
Shift immediate value is incorrectly overwritten by a temporary  
variable in the processing of NEON vsri, vshl and vsli instructions.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 59bf7bc..c92ecc6 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4094,7 +4094,7 @@ static int disas_neon_data_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  int pairwise;
  int u;
  int n;
-uint32_t imm;
+uint32_t imm, imm2;
  TCGv tmp;
  TCGv tmp2;
  TCGv tmp3;
@@ -4624,31 +4624,31 @@ static int disas_neon_data_insn(CPUState *  
env, DisasContext *s, uint32_t insn)
  switch (size) {
  case 0:
  if (op == 4)
-imm = 0xff  -shift;
+imm2 = 0xff  -shift;
  else
-imm = (uint8_t)(0xff  shift);
-imm |= imm  8;
-imm |= imm  16;
+imm2 = (uint8_t)(0xff  shift);
+imm2 |= imm2  8;
+imm2 |= imm2  16;
  break;
  case 1:
  if (op == 4)
-imm = 0x  -shift;
+imm2 = 0x  -shift;
  else
-imm = (uint16_t)(0x  shift);
-imm |= imm  16;
+imm2 = (uint16_t)(0x  shift);
+imm2 |= imm2  16;
  break;
  case 2:
  if (op == 4)
-imm = 0xu  -shift;
+imm2 = 0xu  -shift;
  else
-imm = 0xu  shift;
+imm2 = 0xu  shift;
  break;
  default:
  abort();
  }
  tmp2 = neon_load_reg(rd, pass);
-tcg_gen_andi_i32(tmp, tmp, imm);
-tcg_gen_andi_i32(tmp2, tmp2, ~imm);
+tcg_gen_andi_i32(tmp, tmp, imm2);
+tcg_gen_andi_i32(tmp2, tmp2, ~imm2);
  tcg_gen_or_i32(tmp, tmp, tmp2);
  dead_tmp(tmp2);
  }


translate.c.neonimm.diff
Description: translate.c.neonimm.diff


[Qemu-devel] [PATCH 10/12] target-arm: replace tcg_gen_rori_i32 by tcg_gen_rotri_i32

2009-10-21 Thread Juha.Riihimaki
Use native rotation if possible instead of a simulated one.

From: Filip Navara filip.nav...@gmail.com
Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 99a9ffd..1734fae 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -418,21 +418,6 @@ static inline void tcg_gen_bic_i32(TCGv dest,  
TCGv t0, TCGv t1)
  /* FIXME:  Implement this natively.  */
  #define tcg_gen_abs_i32(t0, t1) gen_helper_abs(t0, t1)

-/* FIXME:  Implement this natively.  */
-static void tcg_gen_rori_i32(TCGv t0, TCGv t1, int i)
-{
-TCGv tmp;
-
-if (i == 0)
-return;
-
-tmp = new_tmp();
-tcg_gen_shri_i32(tmp, t1, i);
-tcg_gen_shli_i32(t1, t1, 32 - i);
-tcg_gen_or_i32(t0, t1, tmp);
-dead_tmp(tmp);
-}
-
  static void shifter_out_im(TCGv var, int shift)
  {
  TCGv tmp = new_tmp();
@@ -484,7 +469,7 @@ static inline void gen_arm_shift_im(TCGv var, int  
shiftop, int shift, int flags)
  if (shift != 0) {
  if (flags)
  shifter_out_im(var, shift - 1);
-tcg_gen_rori_i32(var, var, shift); break;
+tcg_gen_rotri_i32(var, var, shift); break;
  } else {
  TCGv tmp = load_cpu_field(CF);
  if (flags)
@@ -6634,7 +6619,7 @@ static void disas_arm_insn(CPUState * env,  
DisasContext *s)
  /* ??? In many cases it's not neccessary to  
do a
 rotate, a shift is sufficient.  */
  if (shift != 0)
-tcg_gen_rori_i32(tmp, tmp, shift * 8);
+tcg_gen_rotri_i32(tmp, tmp, shift * 8);
  op1 = (insn  20)  7;
  switch (op1) {
  case 0: gen_sxtb16(tmp);  break;
@@ -7451,7 +7436,7 @@ static int disas_thumb2_insn(CPUState *env,  
DisasContext *s, uint16_t insn_hw1)
  /* ??? In many cases it's not neccessary to do a
 rotate, a shift is sufficient.  */
  if (shift != 0)
-tcg_gen_rori_i32(tmp, tmp, shift * 8);
+tcg_gen_rotri_i32(tmp, tmp, shift * 8);
  op = (insn  20)  7;
  switch (op) {
  case 0: gen_sxth(tmp);   break;


translate.c.rori.diff
Description: translate.c.rori.diff


[Qemu-devel] [PATCH 01/12] target-arm: fix resource leak in gen_set_cpsr macro

2009-10-21 Thread Juha.Riihimaki
Current implementation of the gen_set_cpsr macro creates a new  
temporary tcg variable through the tcg_const_i32 call but never marks  
it dead.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index bc51bcb..bda105e 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -184,7 +184,12 @@ static void store_reg(DisasContext *s, int reg,  
TCGv var)
  #define gen_uxtb16(var) gen_helper_uxtb16(var, var)


-#define gen_set_cpsr(var, mask) gen_helper_cpsr_write(var,  
tcg_const_i32(mask))
+#define gen_set_cpsr(var, mask) \
+{ \
+TCGv tmp_mask = tcg_const_i32(mask); \
+gen_helper_cpsr_write(var, tmp_mask); \
+tcg_temp_free_i32(tmp_mask); \
+}
  /* Set NZCV flags from the high 4 bits of var.  */
  #define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV)


translate.c.gensetcpsr.diff
Description: translate.c.gensetcpsr.diff


[Qemu-devel] [PATCH 11/12] target-arm: optimize neon vld/vst ops

2009-10-21 Thread Juha.Riihimaki
Reduce the amount of tcg ops generated from NEON vld/vst instructions  
by simplifying the code generation.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 1734fae..fa03df8 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3692,6 +3692,7 @@ static int disas_neon_ls_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  TCGv tmp;
  TCGv tmp2;
  TCGv_i64 tmp64;
+TCGv stride_v;

  if (!vfp_enabled(env))
return 1;
@@ -3710,6 +3711,7 @@ static int disas_neon_ls_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  interleave = neon_ls_element_type[op].interleave;
  load_reg_var(s, addr, rn);
  stride = (1  size) * interleave;
+stride_v = tcg_const_i32(stride);
  for (reg = 0; reg  nregs; reg++) {
  if (interleave  2 || (interleave == 2  nregs == 2)) {
  load_reg_var(s, addr, rn);
@@ -3728,7 +3730,7 @@ static int disas_neon_ls_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  neon_load_reg64(tmp64, rd);
  gen_st64(tmp64, addr, IS_USER(s));
  }
-tcg_gen_addi_i32(addr, addr, stride);
+tcg_gen_add_i32(addr, addr, stride_v);
  } else {
  for (pass = 0; pass  2; pass++) {
  if (size == 2) {
@@ -3739,58 +3741,57 @@ static int disas_neon_ls_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  tmp = neon_load_reg(rd, pass);
  gen_st32(tmp, addr, IS_USER(s));
  }
-tcg_gen_addi_i32(addr, addr, stride);
+tcg_gen_add_i32(addr, addr, stride_v);
  } else if (size == 1) {
  if (load) {
  tmp = gen_ld16u(addr, IS_USER(s));
  tcg_gen_addi_i32(addr, addr, stride);
  tmp2 = gen_ld16u(addr, IS_USER(s));
-tcg_gen_addi_i32(addr, addr, stride);
-gen_bfi(tmp, tmp, tmp2, 16, 0x);
+tcg_gen_add_i32(addr, addr, stride_v);
+tcg_gen_shli_i32(tmp2, tmp2, 16);
+tcg_gen_or_i32(tmp, tmp, tmp2);
  dead_tmp(tmp2);
  neon_store_reg(rd, pass, tmp);
  } else {
  tmp = neon_load_reg(rd, pass);
-tmp2 = new_tmp();
-tcg_gen_shri_i32(tmp2, tmp, 16);
-gen_st16(tmp, addr, IS_USER(s));
-tcg_gen_addi_i32(addr, addr, stride);
-gen_st16(tmp2, addr, IS_USER(s));
-tcg_gen_addi_i32(addr, addr, stride);
+tcg_gen_qemu_st16(tmp, addr, IS_USER(s));
+tcg_gen_add_i32(addr, addr, stride_v);
+tcg_gen_shri_i32(tmp, tmp, 16);
+tcg_gen_qemu_st16(tmp, addr, IS_USER(s));
+tcg_gen_add_i32(addr, addr, stride_v);
+dead_tmp(tmp);
  }
  } else /* size == 0 */ {
  if (load) {
-TCGV_UNUSED(tmp2);
-for (n = 0; n  4; n++) {
-tmp = gen_ld8u(addr, IS_USER(s));
-tcg_gen_addi_i32(addr, addr, stride);
-if (n == 0) {
-tmp2 = tmp;
-} else {
-gen_bfi(tmp2, tmp2, tmp, n * 8,  
0xff);
-dead_tmp(tmp);
-}
+tmp = gen_ld8u(addr, IS_USER(s));
+tcg_gen_add_i32(addr, addr, stride_v);
+for (n = 1; n  4; n++) {
+tmp2 = gen_ld8u(addr, IS_USER(s));
+tcg_gen_add_i32(addr, addr, stride_v);
+tcg_gen_shli_i32(tmp2, tmp2, n * 8);
+tcg_gen_or_i32(tmp, tmp, tmp2);
+dead_tmp(tmp2);
  }
-neon_store_reg(rd, pass, tmp2);
+neon_store_reg(rd, pass, tmp);
  } else {
-tmp2 = neon_load_reg(rd, pass);
-for (n = 0; n  4; n++) {
-tmp = new_tmp();
-if (n == 0) {
-

[Qemu-devel] [PATCH 04/12] target-arm: allow modifying vfp fpexc en bit only

2009-10-21 Thread Juha.Riihimaki
All other bits except for the EN in the VFP FPEXC register are defined  
as subarchitecture specific and real functionality for any of the  
other bits has not been implemented in QEMU. However, current code  
allows modifying all bits in the VFP FPEXC register leading to  
problems when guest code is writing 1's to the subarchitecture  
specific bits and checking whether the bits stay up to verify the  
existence of functionality which in fact does not exist in QEMU.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index d027572..07ee638 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -2788,6 +2788,9 @@ static int disas_vfp_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  case ARM_VFP_FPEXC:
  if (IS_USER(s))
  return 1;
+/* TODO: VFP subarchitecture support.
+ * For now, keep the EN bit only */
+tcg_gen_andi_i32(tmp, tmp, 1  30);
  store_cpu_field(tmp, vfp.xregs[rn]);
  gen_lookup_tb(s);
  break;


translate.c.vfpfpexc.diff
Description: translate.c.vfpfpexc.diff


[Qemu-devel] [PATCH 12/12] target-arm: fix neon shift helper functions

2009-10-21 Thread Juha.Riihimaki
Current code is broken at least on gcc 4.2, the result of a comparison  
-1 = sizeof(type) * 8 results true and causes wrong code path to be  
taken. The fix utilizes abs() function where applicable and otherwise  
adds a test to ensure both arguments are positive before making the  
aforementioned comparison.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index f32ecd6..0c95035 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -392,7 +392,7 @@ NEON_VOP(abd_u32, neon_u32, 1)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8 || tmp = -sizeof(src1) * 8) { \
+if (abs(tmp) = sizeof(src1) * 8) { \
  dest = 0; \
  } else if (tmp  0) { \
  dest = src1  -tmp; \
@@ -420,7 +420,7 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val,  
uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8) { \
+if (tmp = 0  tmp = sizeof(src1) * 8) { \
  dest = 0; \
  } else if (tmp = -sizeof(src1) * 8) { \
  dest = src1  (sizeof(src1) * 8 - 1); \
@@ -453,7 +453,7 @@ uint64_t HELPER(neon_shl_s64)(uint64_t valop,  
uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8) { \
+if (tmp = 0  tmp = sizeof(src1) * 8) { \
  dest = 0; \
  } else if (tmp  -sizeof(src1) * 8) { \
  dest = src1  (sizeof(src1) * 8 - 1); \
@@ -494,7 +494,7 @@ uint64_t HELPER(neon_rshl_s64)(uint64_t valop,  
uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8 || tmp  -sizeof(src1) * 8) { \
+if (abs(tmp) = sizeof(src1) * 8) { \
  dest = 0; \
  } else if (tmp == -sizeof(src1) * 8) { \
  dest = src1  (tmp - 1); \
@@ -528,7 +528,7 @@ uint64_t HELPER(neon_rshl_u64)(uint64_t val,  
uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8) { \
+if (tmp = 0  tmp = sizeof(src1) * 8) { \
  if (src1) { \
  SET_QC(); \
  dest = ~0; \
@@ -579,7 +579,7 @@ uint64_t HELPER(neon_qshl_u64)(CPUState *env,  
uint64_t val, uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8) { \
+if (tmp = 0  tmp = sizeof(src1) * 8) { \
  if (src1) \
  SET_QC(); \
  dest = src1  31; \


neon_helper.c.diff
Description: neon_helper.c.diff


[Qemu-devel] [PATCH 09/12] target-arm: optimize thumb push/pop ops

2009-10-21 Thread Juha.Riihimaki
Thumb push/pop instructions can be slightly optimized by loading the  
register offset constant into a variable outside the register loop and  
using the preloaded variable inside the loop instead of reloading the  
offset value to a temporary variable on each loop iteration. This  
causes less TCG ops to be generated for a Thumb push/pop instruction.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index abb3105..99a9ffd 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8589,6 +8589,7 @@ static void disas_thumb_insn(CPUState *env,  
DisasContext *s)
  if ((insn  (1  11)) == 0) {
  tcg_gen_addi_i32(addr, addr, -offset);
  }
+tmp2 = tcg_const_i32(4);
  for (i = 0; i  8; i++) {
  if (insn  (1  i)) {
  if (insn  (1  11)) {
@@ -8601,7 +8602,7 @@ static void disas_thumb_insn(CPUState *env,  
DisasContext *s)
  gen_st32(tmp, addr, IS_USER(s));
  }
  /* advance to the next address.  */
-tcg_gen_addi_i32(addr, addr, 4);
+tcg_gen_add_i32(addr, addr, tmp2);
  }
  }
  TCGV_UNUSED(tmp);
@@ -8616,8 +8617,9 @@ static void disas_thumb_insn(CPUState *env,  
DisasContext *s)
  tmp = load_reg(s, 14);
  gen_st32(tmp, addr, IS_USER(s));
  }
-tcg_gen_addi_i32(addr, addr, 4);
+tcg_gen_add_i32(addr, addr, tmp2);
  }
+tcg_temp_free_i32(tmp2);
  if ((insn  (1  11)) == 0) {
  tcg_gen_addi_i32(addr, addr, -offset);
  }


translate.c.tpushpop.diff
Description: translate.c.tpushpop.diff


[Qemu-devel] [PATCH 08/12] target-arm: optimize thumb2 load/store multiple ops

2009-10-21 Thread Juha.Riihimaki
Thumb2 load/store multiple instructions can be slightly optimized by  
loading the register offset constant into a variable outside the  
register loop and using the preloaded variable inside the loop instead  
of reloading the offset value to a temporary variable on each loop  
iteration. This causes less TCG ops to be generated for a Thumb2 load/ 
store multiple instruction.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index c92ecc6..abb3105 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7370,6 +7370,7 @@ static int disas_thumb2_insn(CPUState *env,  
DisasContext *s, uint16_t insn_hw1)
  tcg_gen_addi_i32(addr, addr, -offset);
  }

+tmp2 = tcg_const_i32(4);
  for (i = 0; i  16; i++) {
  if ((insn  (1  i)) == 0)
  continue;
@@ -7386,8 +7387,9 @@ static int disas_thumb2_insn(CPUState *env,  
DisasContext *s, uint16_t insn_hw1)
  tmp = load_reg(s, i);
  gen_st32(tmp, addr, IS_USER(s));
  }
-tcg_gen_addi_i32(addr, addr, 4);
+tcg_gen_add_i32(addr, addr, tmp2);
  }
+tcg_temp_free_i32(tmp2);
  if (insn  (1  21)) {
  /* Base register writeback.  */
  if (insn  (1  24)) {


translate.c.t2ldmstm.diff
Description: translate.c.t2ldmstm.diff


[Qemu-devel] [PATCH 05/12] target-arm: optimize vfp load/store multiple ops

2009-10-21 Thread Juha.Riihimaki
VFP load/store multiple instructions can be slightly optimized by  
loading the register offset constant into a variable outside the  
register loop and using the preloaded variable inside the loop instead  
of reloading the offset value to a temporary variable on each loop  
iteration. This causes less TCG ops to be generated for a VFP load/ 
store multiple instruction.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 07ee638..e5a2881 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3222,6 +3222,7 @@ static int disas_vfp_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  offset = 8;
  else
  offset = 4;
+tmp = tcg_const_i32(offset);
  for (i = 0; i  n; i++) {
  if (insn  ARM_CP_RW_BIT) {
  /* load */
@@ -3232,8 +3233,9 @@ static int disas_vfp_insn(CPUState * env,  
DisasContext *s, uint32_t insn)
  gen_mov_F0_vreg(dp, rd + i);
  gen_vfp_st(s, dp, addr);
  }
-tcg_gen_addi_i32(addr, addr, offset);
+tcg_gen_add_i32(addr, addr, tmp);
  }
+tcg_temp_free_i32(tmp);
  if (insn  (1  21)) {
  /* writeback */
  if (insn  (1  24))


translate.c.vfpldmstm.diff
Description: translate.c.vfpldmstm.diff


[Qemu-devel] [PATCH 12/12] [RESEND] target-arm: fix neon shift helper functions

2009-10-21 Thread Juha.Riihimaki
Current code is broken at least on gcc 4.2, the result of a comparison
-1 = sizeof(type) * 8 results true and causes wrong code path to be
taken. The fix utilizes abs() function where applicable and otherwise
adds a test to ensure both arguments are positive before making the
aforementioned comparison.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index f32ecd6..0c95035 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -392,7 +392,7 @@ NEON_VOP(abd_u32, neon_u32, 1)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8 || tmp = -sizeof(src1) * 8) { \
+if (abs(tmp) = sizeof(src1) * 8) { \
  dest = 0; \
  } else if (tmp  0) { \
  dest = src1  -tmp; \
@@ -420,7 +420,7 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val,  
uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8) { \
+if (tmp = 0  tmp = sizeof(src1) * 8) { \
  dest = 0; \
  } else if (tmp = -sizeof(src1) * 8) { \
  dest = src1  (sizeof(src1) * 8 - 1); \
@@ -453,7 +453,7 @@ uint64_t HELPER(neon_shl_s64)(uint64_t valop,  
uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8) { \
+if (tmp = 0  tmp = sizeof(src1) * 8) { \
  dest = 0; \
  } else if (tmp  -sizeof(src1) * 8) { \
  dest = src1  (sizeof(src1) * 8 - 1); \
@@ -494,7 +494,7 @@ uint64_t HELPER(neon_rshl_s64)(uint64_t valop,  
uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8 || tmp  -sizeof(src1) * 8) { \
+if (abs(tmp) = sizeof(src1) * 8) { \
  dest = 0; \
  } else if (tmp == -sizeof(src1) * 8) { \
  dest = src1  (tmp - 1); \
@@ -528,7 +528,7 @@ uint64_t HELPER(neon_rshl_u64)(uint64_t val,  
uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8) { \
+if (tmp = 0  tmp = sizeof(src1) * 8) { \
  if (src1) { \
  SET_QC(); \
  dest = ~0; \
@@ -579,7 +579,7 @@ uint64_t HELPER(neon_qshl_u64)(CPUState *env,  
uint64_t val, uint64_t shiftop)
  #define NEON_FN(dest, src1, src2) do { \
  int8_t tmp; \
  tmp = (int8_t)src2; \
-if (tmp = sizeof(src1) * 8) { \
+if (tmp = 0  tmp = sizeof(src1) * 8) { \
  if (src1) \
  SET_QC(); \
  dest = src1  31; \


neon_helper.c.diff
Description: neon_helper.c.diff


Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup

2009-10-20 Thread Juha.Riihimaki

On Oct 19, 2009, at 16:23, ext Aurelien Jarno wrote:

 I think I have a couple of other fixes and patches on top of that as
 well, but I'd rather wait until you get this bunch committed and then
 format the patches against the new mainline so that they apply.

 Thanks I have seen your patch, I'll have a closer look later today.

Ok, that was just the resource leak patch. I have some others on top  
of that, I'll post them as soon as I get them (re)formatted.

 Perhaps this would also be a good place and time to also ask whether
 you would be interested in integrating support for OMAP3 in the QEMU
 mainline? We have been developing and using it for several months  
 now,
 it's based on the work done by Yajin ya...@vm-kernel.org to support
 the OMAP3 BeagleBoard hardware. We have added support for the Nokia
 N900 system emulation as well. In my personal opinion the OMAP3
 emulation is in functionality on par with the existing OMAP2
 emulation, if not even more complete.

 That would be very nice to have such an emulated board in mainstream
 QEMU. I would be happy to review your patches.

Alright I'll look into generating a patch set for that as well but it  
might take a bit longer time to do it since it is a fairly large chunk  
of code with several modifications to the existing OMAP1/2 (and some  
others) code as well to accommodate for the OMAP3 features. The OMAP3  
support also calls for some further changes in the ARM core emulation  
to make the guest Linux kernel happy.


Regards,
Juha




Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup

2009-10-19 Thread Juha.Riihimaki
 Apart from the points you have raised about specific patches there
 were few more minor bugs in the series spotted by Juha Riihimäki
 juha.riihim...@nokia.com. A fix is available at
 http://repo.or.cz/w/qemu/navara.git?a=commit;h=2ce696baa6fc5d99522cf387b6a4913807fd43ed

 One of the fix was already in my branch (catched by Laurent  
 Desnogues).
 I have added the other fixes in my branch. The last to hunks are good
 example why temp new/free should be done explicitly ;)

I think I have a couple of other fixes and patches on top of that as  
well, but I'd rather wait until you get this bunch committed and then  
format the patches against the new mainline so that they apply.

On the subject of the new_tmp/dead_tmp, can you elaborate how critical  
it is if there are resource leaks in the translator code, i.e. if some  
of the temporary variables don't get marked dead? I suppose the  
leakage would only affect the translation of the code block where it  
appears? I found more leaks by introducing a new_tmp64/dead_tmp64 and  
some other checkups to catch programming errors.

Some of the generated tcg code is not very optimal, for example a  
single vld4.8 instruction can generate over 250 tcg ops. I did some  
optimizations and got it under 200 but do you think it could be an  
issue that a single instruction can expand to so many tcg ops? I mean  
besides the fact that it causes TBs for only one or two guest  
instructions to be generated.

Perhaps this would also be a good place and time to also ask whether  
you would be interested in integrating support for OMAP3 in the QEMU  
mainline? We have been developing and using it for several months now,  
it's based on the work done by Yajin ya...@vm-kernel.org to support  
the OMAP3 BeagleBoard hardware. We have added support for the Nokia  
N900 system emulation as well. In my personal opinion the OMAP3  
emulation is in functionality on par with the existing OMAP2  
emulation, if not even more complete.


Cheers,
Juha



[Qemu-devel] [PATCH] target-arm: cleanup internal resource leaks

2009-10-19 Thread Juha.Riihimaki
Current ARM translator code has several places where it leaves
temporary TCG variables alive. This patch removes all such instances I
have found so far. Sorry for the mangled inlined patch, the mailserver
I use doesn't like patches so I've also included it as an attachment
that hopefully status correctly formatted. Should apply cleanly
against latest git.

Signed-off-by: Juha Riihimäki juha.riihim...@nokia.com
---
diff --git a/target-arm/translate.c b/target-arm/translate.c
index e56082b..bc51bcb 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -287,6 +287,7 @@ static TCGv_i64 gen_mulu_i64_i32(TCGv a, TCGv b)
  tcg_gen_extu_i32_i64(tmp2, b);
  dead_tmp(b);
  tcg_gen_mul_i64(tmp1, tmp1, tmp2);
+tcg_temp_free_i64(tmp2);
  return tmp1;
  }

@@ -300,6 +301,7 @@ static TCGv_i64 gen_muls_i64_i32(TCGv a, TCGv b)
  tcg_gen_ext_i32_i64(tmp2, b);
  dead_tmp(b);
  tcg_gen_mul_i64(tmp1, tmp1, tmp2);
+tcg_temp_free_i64(tmp2);
  return tmp1;
  }

@@ -312,9 +314,11 @@ static void gen_mull(TCGv a, TCGv b)
  tcg_gen_extu_i32_i64(tmp1, a);
  tcg_gen_extu_i32_i64(tmp2, b);
  tcg_gen_mul_i64(tmp1, tmp1, tmp2);
+tcg_temp_free_i64(tmp2);
  tcg_gen_trunc_i64_i32(a, tmp1);
  tcg_gen_shri_i64(tmp1, tmp1, 32);
  tcg_gen_trunc_i64_i32(b, tmp1);
+tcg_temp_free_i64(tmp1);
  }

  /* Signed 32x32-64 multiply.  */
@@ -326,9 +330,11 @@ static void gen_imull(TCGv a, TCGv b)
  tcg_gen_ext_i32_i64(tmp1, a);
  tcg_gen_ext_i32_i64(tmp2, b);
  tcg_gen_mul_i64(tmp1, tmp1, tmp2);
+tcg_temp_free_i64(tmp2);
  tcg_gen_trunc_i64_i32(a, tmp1);
  tcg_gen_shri_i64(tmp1, tmp1, 32);
  tcg_gen_trunc_i64_i32(b, tmp1);
+tcg_temp_free_i64(tmp1);
  }

  /* Swap low and high halfwords.  */
@@ -542,11 +548,13 @@ static void gen_arm_parallel_addsub(int op1, int
op2, TCGv a, TCGv b)
  tmp = tcg_temp_new_ptr();
  tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE));
  PAS_OP(s)
+tcg_temp_free_ptr(tmp);
  break;
  case 5:
  tmp = tcg_temp_new_ptr();
  tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE));
  PAS_OP(u)
+tcg_temp_free_ptr(tmp);
  break;
  #undef gen_pas_helper
  #define gen_pas_helper(name) glue(gen_helper_,name)(a, a, b)
@@ -587,11 +595,13 @@ static void gen_thumb2_parallel_addsub(int op1,
int op2, TCGv a, TCGv b)
  tmp = tcg_temp_new_ptr();
  tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE));
  PAS_OP(s)
+tcg_temp_free_ptr(tmp);
  break;
  case 4:
  tmp = tcg_temp_new_ptr();
  tcg_gen_addi_ptr(tmp, cpu_env, offsetof(CPUState, GE));
  PAS_OP(u)
+tcg_temp_free_ptr(tmp);
  break;
  #undef gen_pas_helper
  #define gen_pas_helper(name) glue(gen_helper_,name)(a, a, b)
@@ -995,10 +1005,12 @@ static inline void gen_vfp_tosiz(int dp)
  #define VFP_GEN_FIX(name) \
  static inline void gen_vfp_##name(int dp, int shift) \
  { \
+TCGv tmp_shift = tcg_const_i32(shift); \
  if (dp) \
-gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tcg_const_i32
(shift), cpu_env);\
+gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tmp_shift,
cpu_env);\
  else \
-gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tcg_const_i32
(shift), cpu_env);\
+gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tmp_shift,
cpu_env);\
+tcg_temp_free_i32(tmp_shift); \
  }
  VFP_GEN_FIX(tosh)
  VFP_GEN_FIX(tosl)
@@ -2399,7 +2411,7 @@ static int disas_dsp_insn(CPUState *env,
DisasContext *s, uint32_t insn)
 instruction is not defined.  */
  static int disas_cp_insn(CPUState *env, DisasContext *s, uint32_t
insn)
  {
-TCGv tmp;
+TCGv tmp, tmp2;
  uint32_t rd = (insn  12)  0xf;
  uint32_t cp = (insn  8)  0xf;
  if (IS_USER(s)) {
@@ -2411,14 +2423,18 @@ static int disas_cp_insn(CPUState *env,
DisasContext *s, uint32_t insn)
  return 1;
  gen_set_pc_im(s-pc);
  tmp = new_tmp();
-gen_helper_get_cp(tmp, cpu_env, tcg_const_i32(insn));
+tmp2 = tcg_const_i32(insn);
+gen_helper_get_cp(tmp, cpu_env, tmp2);
+tcg_temp_free(tmp2);
  store_reg(s, rd, tmp);
  } else {
  if (!env-cp[cp].cp_write)
  return 1;
  gen_set_pc_im(s-pc);
  tmp = load_reg(s, rd);
-gen_helper_set_cp(cpu_env, tcg_const_i32(insn), tmp);
+tmp2 = tcg_const_i32(insn);
+gen_helper_set_cp(cpu_env, tmp2, tmp);
+tcg_temp_free(tmp2);
  dead_tmp(tmp);
  }
  return 0;
@@ -2449,7 +2465,7 @@ static int cp15_user_ok(uint32_t insn)
  static int disas_cp15_insn(CPUState *env, DisasContext *s, uint32_t
insn)
  {
  uint32_t rd;
-TCGv tmp;
+TCGv tmp, tmp2;

  /* M profile cores use memory mapped registers instead of cp15.
*/
  if (arm_feature(env, ARM_FEATURE_M))
@@ -2478,9 +2494,10 @@ static int disas_cp15_insn(CPUState *env,
DisasContext *s, uint32_t