On 22/10/2019 17.01, Christophe Leroy wrote: > > > On 10/18/2019 12:52 PM, Rasmus Villemoes wrote: >> In preparation for allowing to build QE support for architectures >> other than PPC, replace the ppc-specific io accessors. Done via >> > > This patch is not transparent in terms of performance, functions get > changed significantly. > > Before the patch: > > 00000330 <ucc_fast_enable>: > 330: 81 43 00 04 lwz r10,4(r3) > 334: 7c 00 04 ac hwsync > 338: 81 2a 00 00 lwz r9,0(r10) > 33c: 0c 09 00 00 twi 0,r9,0 > 340: 4c 00 01 2c isync > 344: 70 88 00 02 andi. r8,r4,2 > 348: 41 82 00 10 beq 358 <ucc_fast_enable+0x28> > 34c: 39 00 00 01 li r8,1 > 350: 91 03 00 10 stw r8,16(r3) > 354: 61 29 00 10 ori r9,r9,16 > 358: 70 88 00 01 andi. r8,r4,1 > 35c: 41 82 00 10 beq 36c <ucc_fast_enable+0x3c> > 360: 39 00 00 01 li r8,1 > 364: 91 03 00 14 stw r8,20(r3) > 368: 61 29 00 20 ori r9,r9,32 > 36c: 7c 00 04 ac hwsync > 370: 91 2a 00 00 stw r9,0(r10) > 374: 4e 80 00 20 blr > > After the patch: > > 0000030c <ucc_fast_enable>: > 30c: 94 21 ff e0 stwu r1,-32(r1) > 310: 7c 08 02 a6 mflr r0 > 314: bf a1 00 14 stmw r29,20(r1) > 318: 7c 9f 23 78 mr r31,r4 > 31c: 90 01 00 24 stw r0,36(r1) > 320: 7c 7e 1b 78 mr r30,r3 > 324: 83 a3 00 04 lwz r29,4(r3) > 328: 7f a3 eb 78 mr r3,r29 > 32c: 48 00 00 01 bl 32c <ucc_fast_enable+0x20> > 32c: R_PPC_REL24 ioread32be > 330: 73 e9 00 02 andi. r9,r31,2 > 334: 41 82 00 10 beq 344 <ucc_fast_enable+0x38> > 338: 39 20 00 01 li r9,1 > 33c: 91 3e 00 10 stw r9,16(r30) > 340: 60 63 00 10 ori r3,r3,16 > 344: 73 e9 00 01 andi. r9,r31,1 > 348: 41 82 00 10 beq 358 <ucc_fast_enable+0x4c> > 34c: 39 20 00 01 li r9,1 > 350: 91 3e 00 14 stw r9,20(r30) > 354: 60 63 00 20 ori r3,r3,32 > 358: 80 01 00 24 lwz r0,36(r1) > 35c: 7f a4 eb 78 mr r4,r29 > 360: bb a1 00 14 lmw r29,20(r1) > 364: 7c 08 03 a6 mtlr r0 > 368: 38 21 00 20 addi r1,r1,32 > 36c: 48 00 00 00 b 36c <ucc_fast_enable+0x60> > 36c: R_PPC_REL24 iowrite32be
True. Do you know why powerpc uses out-of-line versions of these accessors when !PPC_INDIRECT_PIO, i.e. at least all of PPC32? It's quite a bit beyond the scope of this series, but I'd expect moving most if not all of arch/powerpc/kernel/iomap.c into asm/io.h (guarded by !defined(CONFIG_PPC_INDIRECT_PIO) of course) as static inlines would benefit all ppc32 users of iowrite32 and friends. Is there some other primitive available that (a) is defined on all architectures (or at least both ppc and arm) and (b) expands to good code in both/all cases? Note that a few uses of the the iowrite32be accessors has already appeared in the qe code with the introduction of the qe_clrsetbits() helpers in bb8b2062af. Rasmus