Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup

2009-11-10 Thread Aurelien Jarno
On Tue, Nov 10, 2009 at 11:38:49PM +, Paul Brook wrote:
> > > Some of the generated tcg code is not very optimal, for example a
> > > single vld4.8 instruction can generate over 250 tcg ops. I did some
> > > optimizations and got it under 200 but do you think it could be an
> > > issue that a single instruction can expand to so many tcg ops? I mean
> > > besides the fact that it causes TBs for only one or two guest
> > > instructions to be generated.
> > 
> > Fabrice wrote this (tcg/README):
> > 
> >   Don't hesitate to use helpers for complicated or seldom used target
> >   intructions. There is little performance advantage in using TCG to
> >   implement target instructions taking more than about twenty TCG
> >   instructions.
> > 
> > How applicable is it, I can't say.  It'd probably be a good thing
> > to benchmark the two versions, TCG vs helper.
> 
> The problem is that you can not do memory accesses from within a helper 
> function.
> 

It is something possible, it's done for example for unaligned memory
access on MIPS (see for example helper_ldr).

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup

2009-11-10 Thread Paul Brook
> On the code itself, I don't really like the remaining, new_tmp(),
> dead_tmp(), and even more the fact that some functions can allocate
> (e.g load_reg) or free (e.g. store_reg) some TCG variables implicitely.
> This is a way to make errors by reallocating or forgetting to free some
> 
> variables, and that leads to strange code like:
> |if (rn == 15) {
> |tmp = new_tmp();
> |tcg_gen_movi_i32(tmp, 0);
> |} else {
> |tmp = load_reg(s, rn);
> |}

There is actually logic behind this
Consider the obvious implementation of the "neg" instruction:

val = load_reg(rn);
tcg_gen_neg_i32(val, val);
store_reg(rd, val);

With the current code this is safe. However if load_reg returns cpu_R[n] 
instead of a temporary then the above code will incorrectly clobber the source 
register.

My theory was that tracking down an accidental write to a source register is 
much harder than tracking down a mismatched temporary.

Paul




Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup

2009-11-10 Thread Paul Brook
> > Some of the generated tcg code is not very optimal, for example a
> > single vld4.8 instruction can generate over 250 tcg ops. I did some
> > optimizations and got it under 200 but do you think it could be an
> > issue that a single instruction can expand to so many tcg ops? I mean
> > besides the fact that it causes TBs for only one or two guest
> > instructions to be generated.
> 
> Fabrice wrote this (tcg/README):
> 
>   Don't hesitate to use helpers for complicated or seldom used target
>   intructions. There is little performance advantage in using TCG to
>   implement target instructions taking more than about twenty TCG
>   instructions.
> 
> How applicable is it, I can't say.  It'd probably be a good thing
> to benchmark the two versions, TCG vs helper.

The problem is that you can not do memory accesses from within a helper 
function.

Paul




Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup

2009-10-19 Thread Juha.Riihimaki

On Oct 19, 2009, at 16:23, ext Aurelien Jarno wrote:

>> I think I have a couple of other fixes and patches on top of that as
>> well, but I'd rather wait until you get this bunch committed and then
>> format the patches against the new mainline so that they apply.
>
> Thanks I have seen your patch, I'll have a closer look later today.

Ok, that was "just" the resource leak patch. I have some others on top  
of that, I'll post them as soon as I get them (re)formatted.

>> Perhaps this would also be a good place and time to also ask whether
>> you would be interested in integrating support for OMAP3 in the QEMU
>> mainline? We have been developing and using it for several months  
>> now,
>> it's based on the work done by Yajin  to support
>> the OMAP3 BeagleBoard hardware. We have added support for the Nokia
>> N900 system emulation as well. In my personal opinion the OMAP3
>> emulation is in functionality on par with the existing OMAP2
>> emulation, if not even more complete.
>
> That would be very nice to have such an emulated board in mainstream
> QEMU. I would be happy to review your patches.

Alright I'll look into generating a patch set for that as well but it  
might take a bit longer time to do it since it is a fairly large chunk  
of code with several modifications to the existing OMAP1/2 (and some  
others) code as well to accommodate for the OMAP3 features. The OMAP3  
support also calls for some further changes in the ARM core emulation  
to make the guest Linux kernel happy.


Regards,
Juha




Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup

2009-10-19 Thread Aurelien Jarno
On Mon, Oct 19, 2009 at 08:23:11AM +0200, juha.riihim...@nokia.com wrote:
> >> Apart from the points you have raised about specific patches there
> >> were few more minor bugs in the series spotted by Juha Riihimäki
> >> . A fix is available at
> >> http://repo.or.cz/w/qemu/navara.git?a=commit;h=2ce696baa6fc5d99522cf387b6a4913807fd43ed
> >
> > One of the fix was already in my branch (catched by Laurent  
> > Desnogues).
> > I have added the other fixes in my branch. The last to hunks are good
> > example why temp new/free should be done explicitly ;)
> 
> I think I have a couple of other fixes and patches on top of that as  
> well, but I'd rather wait until you get this bunch committed and then  
> format the patches against the new mainline so that they apply.

Thanks I have seen your patch, I'll have a closer look later today.

> On the subject of the new_tmp/dead_tmp, can you elaborate how critical  
> it is if there are resource leaks in the translator code, i.e. if some  
> of the temporary variables don't get marked dead? I suppose the  
> leakage would only affect the translation of the code block where it  
> appears? I found more leaks by introducing a new_tmp64/dead_tmp64 and  
> some other checkups to catch programming errors.

As long as they are not to many leaks by TB, it should not be a problem.
If there is a leak on a single instruction, and this instruction is used
a lot of times, the maximum number of temps (512) can be reached, which
causes qemu to stop.

> Some of the generated tcg code is not very optimal, for example a  
> single vld4.8 instruction can generate over 250 tcg ops. I did some  
> optimizations and got it under 200 but do you think it could be an  
> issue that a single instruction can expand to so many tcg ops? I mean  
> besides the fact that it causes TBs for only one or two guest  
> instructions to be generated.

According to Fabrice, an helper starts to be faster starting to 20 ops.
I think however it depends a lot on the host architecture, and therefore
it's difficult to give a limit. 200 looks high though.

> Perhaps this would also be a good place and time to also ask whether  
> you would be interested in integrating support for OMAP3 in the QEMU  
> mainline? We have been developing and using it for several months now,  
> it's based on the work done by Yajin  to support  
> the OMAP3 BeagleBoard hardware. We have added support for the Nokia  
> N900 system emulation as well. In my personal opinion the OMAP3  
> emulation is in functionality on par with the existing OMAP2  
> emulation, if not even more complete.

That would be very nice to have such an emulated board in mainstream
QEMU. I would be happy to review your patches.

Cheers,
Aurelien

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup

2009-10-18 Thread Laurent Desnogues
On Mon, Oct 19, 2009 at 8:23 AM,   wrote:
>
> I think I have a couple of other fixes and patches on top of that as
> well, but I'd rather wait until you get this bunch committed and then
> format the patches against the new mainline so that they apply.

It's already been committed (the commit notification process is
again dead).

> On the subject of the new_tmp/dead_tmp, can you elaborate how critical
> it is if there are resource leaks in the translator code, i.e. if some
> of the temporary variables don't get marked dead? I suppose the
> leakage would only affect the translation of the code block where it
> appears? I found more leaks by introducing a new_tmp64/dead_tmp64 and
> some other checkups to catch programming errors.

Yes it would only affect one TB.  The price would be a higher time
spent in the TCG -> host code generation pass, since some parts
are linear in the number of live temps.

> Some of the generated tcg code is not very optimal, for example a
> single vld4.8 instruction can generate over 250 tcg ops. I did some
> optimizations and got it under 200 but do you think it could be an
> issue that a single instruction can expand to so many tcg ops? I mean
> besides the fact that it causes TBs for only one or two guest
> instructions to be generated.

Fabrice wrote this (tcg/README):

  Don't hesitate to use helpers for complicated or seldom used target
  intructions. There is little performance advantage in using TCG to
  implement target instructions taking more than about twenty TCG
  instructions.

How applicable is it, I can't say.  It'd probably be a good thing
to benchmark the two versions, TCG vs helper.

> Perhaps this would also be a good place and time to also ask whether
> you would be interested in integrating support for OMAP3 in the QEMU
> mainline? We have been developing and using it for several months now,
> it's based on the work done by Yajin  to support
> the OMAP3 BeagleBoard hardware. We have added support for the Nokia
> N900 system emulation as well. In my personal opinion the OMAP3
> emulation is in functionality on par with the existing OMAP2
> emulation, if not even more complete.

I would certainly like to see that in mainline!


Laurent




Re: [Qemu-devel] [PATCH 00/18] target-arm cleanup

2009-10-18 Thread Juha.Riihimaki
>> Apart from the points you have raised about specific patches there
>> were few more minor bugs in the series spotted by Juha Riihimäki
>> . A fix is available at
>> http://repo.or.cz/w/qemu/navara.git?a=commit;h=2ce696baa6fc5d99522cf387b6a4913807fd43ed
>
> One of the fix was already in my branch (catched by Laurent  
> Desnogues).
> I have added the other fixes in my branch. The last to hunks are good
> example why temp new/free should be done explicitly ;)

I think I have a couple of other fixes and patches on top of that as  
well, but I'd rather wait until you get this bunch committed and then  
format the patches against the new mainline so that they apply.

On the subject of the new_tmp/dead_tmp, can you elaborate how critical  
it is if there are resource leaks in the translator code, i.e. if some  
of the temporary variables don't get marked dead? I suppose the  
leakage would only affect the translation of the code block where it  
appears? I found more leaks by introducing a new_tmp64/dead_tmp64 and  
some other checkups to catch programming errors.

Some of the generated tcg code is not very optimal, for example a  
single vld4.8 instruction can generate over 250 tcg ops. I did some  
optimizations and got it under 200 but do you think it could be an  
issue that a single instruction can expand to so many tcg ops? I mean  
besides the fact that it causes TBs for only one or two guest  
instructions to be generated.

Perhaps this would also be a good place and time to also ask whether  
you would be interested in integrating support for OMAP3 in the QEMU  
mainline? We have been developing and using it for several months now,  
it's based on the work done by Yajin  to support  
the OMAP3 BeagleBoard hardware. We have added support for the Nokia  
N900 system emulation as well. In my personal opinion the OMAP3  
emulation is in functionality on par with the existing OMAP2  
emulation, if not even more complete.


Cheers,
Juha