On 27/11/2019 12:03, Alexander Monakov wrote:
> Hi!
> 
> I've attempted to study the implementation of memcpy for 32-bit Arm cores in
> Glibc (which is also found in arm-optimized-routines and first appeared in 
> Linaro's cortex-strings project), and I came across a peculiar snippet:
> 
> #ifdef USE_VFP
>       /* Magic dust alert!  Force VFP on Cortex-A9.  Experiments show
>          that the FP pipeline is much better at streaming loads and
>          stores.  This is outside the critical loop.  */
>       vmov.f32        s0, s0
> #endif
> 
> This seems to imply that this NOP-like instruction affects CPU state and makes
> the vldr/vstr instructions that follow use different datapaths that they might
> otherwise?  Can anyone shed more light on this, please?
> 
> 
> I was able to trace history of this code back to revision 100 in 
> cortex-strings
> repository, where it appeared as part of a large rewrite by Will Newton:
>  
>  
> https://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/revision/100/src/linaro-a9/memcpy.S
> 
> The entire memcpy.S file in Arm optimized-routines repo can be found here:
> 
>  
> https://github.com/ARM-software/optimized-routines/blob/master/string/arm/memcpy.S

Unfortunately this snippet did not raise any question in patch revision [1].

My guess after consulting the "Cortex-A9 NEON Media Processing Engine" 
manual [2] is since the Cortex-A9 processor (implemented with the MPE) contains
distinct data-processing units for integer operation, Advanced-SIMD, and VFP 
(page 3-19) is to force the usage of VFP data-process unit. 

However both the vldr and vstr are described in the manual as:

Name   Advanced   SIMD VFP   Description
VLDR          X       F, D   Load Single Register
VSTR          X       F, D   Store Register

Meaning that the vldr/vstr usage in the below in the loop should exercises the
Advanced SIMD. 

I couldn't find any information on how the data-processing unit is selected 
on A9 technical manual site [3], neither how previous instructions could 
influence.  

[1] https://patches.linaro.org/patch/16133/
[2] 
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0409g/DDI0409G_cortex_a9_neon_mpe_r3p0_trm.pdf
[3] 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.cortexa.a9/index.html
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to