On Wed, Dec 03, 2014 at 08:29:37PM +0530, Madhavan Srinivasan wrote:
> On Tuesday 02 December 2014 03:05 AM, Gabriel Paubert wrote:
> > On Thu, Nov 27, 2014 at 05:48:40PM +0530, Madhavan Srinivasan wrote:
> >> diff --git a/arch/powerpc/include/asm/exception-64s.h 
> >> b/arch/powerpc/include/asm/exception-64s.h
> >> index 77f52b2..c42919a 100644
> >> --- a/arch/powerpc/include/asm/exception-64s.h
> >> +++ b/arch/powerpc/include/asm/exception-64s.h
> >> @@ -306,7 +306,26 @@ do_kvm_##n:                                           
> >>                 \
> >>    std     r10,0(r1);              /* make stack chain pointer     */ \
> >>    std     r0,GPR0(r1);            /* save r0 in stackframe        */ \
> >>    std     r10,GPR1(r1);           /* save r1 in stackframe        */ \
> >> -  beq     4f;                     /* if from kernel mode          */ \
> >> +BEGIN_FTR_SECTION;                                                        
> >>    \
> >> +  lis     r9,4096;                /* Create a mask with HV and PR */ \
> >> +  rldicr  r9,r9,32,31;            /* bits, AND with the MSR       */ \
> >> +  mr      r10,r9;                 /* to check for Hyp state       */ \
> >> +  ori     r9,r9,16384;                                               \
> >> +  and     r9,r12,r9;                                                 \
> >> +  cmpd    cr3,r10,r9;                                                     
> >>    \
> >> +  beq     cr3,66f;                /* Jump if we come from Hyp mode*/ \
> >> +  mtcrf   0x04,r10;               /* Clear CR5 if coming from usr */ \
> > 
> > I think you can do better than this, powerpc has a fantastic set
> > of rotate and mask instructions. If I understand correctly your
> > code you can replace it with the following:
> > 
> >     rldicl  r10,r12,4,63       /* Extract HV bit to LSB of r10*/
> >     rlwinm  r9,r12,19,0x02     /* Extract PR bit to 2nd to last bit of r9 */
> >     or      r9,r9,10
> >     cmplwi  cr3,r9,1           /* Check for HV=1 and PR=0 */                
> >     beq     cr3,66f
> >     mtcrf   0x04,r10           /* Bits going to cr5 bits are 0 in r10 */
> > 
> 
> >From previous comment, at this point, CR0 already has MSR[PR], and only
> HV need to be checked. So I guess there still room for optimization.
> But good to learn this seq.

Actually the sequence I suggested can even be shortened again since the or
is not necessary and you can use an rlwimi instead.

        rldicl r9,r12,5,62      /*  r9 = 62 zeroes | HV | ?  */
        rlwimi r9,r12,18,0x01   /*  r9 = 62 zeroes | HV | PR */
        cmplwi cr3,r9,2         /* Check for HV=1 and PR=0 */

For 32 bit operands, rlwimi is a generic bit field to bit field move, but
GCC is (was, maybe GCC5 is better?) quite bad at recognizing opportunities
to use it.

I'm not sure that it is better to have two separate branches, one for
testing PR and the other for testing HV. Through how many branches can
the hardware speculate? 

Unless I'm mistaken, this is code which is likely not to be in the
cache so I would optimize it first towards minimal code size.

        Regards,
        Gabriel
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to