On Sun, Apr 10, 2011 at 10:17:26PM +0200, Alexander Graf wrote: > > On 10.04.2011, at 22:08, Aurelien Jarno wrote: > > > On Sun, Apr 10, 2011 at 09:25:33PM +0200, Alexander Graf wrote: > >> > >> On 10.04.2011, at 21:23, Aurelien Jarno wrote: > >> > >>> On Tue, Apr 05, 2011 at 09:55:09AM +0200, Alexander Graf wrote: > >>>> > >>>> On 05.04.2011, at 06:54, Aurelien Jarno wrote: > >>>> > >>>>> On Mon, Apr 04, 2011 at 04:32:24PM +0200, Alexander Graf wrote: > >>>>>> With the s390x target we use the deposit instruction to store 32bit > >>>>>> values > >>>>>> into 64bit registers without clobbering the upper 32 bits. > >>>>>> > >>>>>> This specific operation can be optimized slightly by using the ext > >>>>>> operation > >>>>>> instead of an explicit and in the deposit instruction. This patch adds > >>>>>> that > >>>>>> special case to the generic deposit implementation. > >>>>>> > >>>>>> Signed-off-by: Alexander Graf <ag...@suse.de> > >>>>>> --- > >>>>>> tcg/tcg-op.h | 6 +++++- > >>>>>> 1 files changed, 5 insertions(+), 1 deletions(-) > >>>>> > >>>>> Have you really measuring a difference here? This should already be > >>>>> handled, at least on x86, by this code: > >>>>> > >>>>> if (TCG_TARGET_REG_BITS == 64) { > >>>>> if (val == 0xffffffffu) { > >>>>> tcg_out_ext32u(s, r0, r0); > >>>>> return; > >>>>> } > >>>>> if (val == (uint32_t)val) { > >>>>> /* AND with no high bits set can use a 32-bit operation. > >>>>> */ > >>>>> rexw = 0; > >>>>> } > >>>>> } > >>>> > >>>> I've certainly looked at the -d op logs and seen that instead of > >>>> creating a const tcg variable plus an AND there was now an extu opcode > >>>> issued, yes. No idea why the case up there didn't trigger. > >>>> > >>> > >>> The question there is looking at -d out_asm. They should be the same at > >>> the end as the code I pasted above is from tcg/i386/tcg-target.c. > >> > >> Yes. I was trying to optimize for maximum op length. TCG defines a maximum > >> number of tcg ops to be issued by each target instruction. Since s390 is > >> very CISCy, there are instructions that translate into lots of microops, > >> but are still faster than a C call (register save/restore mostly). > >> > >> Without this patch, there are some places where we hit that number :). > > > > Is it on 32-bit on or 64-bit? If we reach this number, it's probably > > better to either implement this instruction with an helper, or maybe > > increase the number of maximum ops. What is this instruction? > > This was on x86_64. I hit limits with LMH and LM, but reduced them to fit > into the picture with this optimization :). If you like, I can give you a > statically linked binary that could exceed the limits. >
Yeah for what I see it's the loop is unrolled there. Not sure it is the best to do. Also if the limit is exceeded on 64-bit it is for sure exceeded on 32-bit hosts. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net