Hello Vadim,

That sounds like fine work for improving efficiency on ARM.  Pls feel
free to send me a patch, perhaps including a patch for README that
describes the ./configure options.

I am not very good at ./configure (autoconf/configure.in to
autogen.sh/configure.ac) but I am happy to follow your lead.

Thanks,

David

On Wed, 2012-11-14 at 19:28 +0400, Markovtsev Vadim wrote:
> Hi all,
> 
>  
> 
> I managed to improve codec2 performance even 10% more on ARM NEON. I
> replaced some math functions with those from math-neon project (my
> libc version is 2.13). So overall ARM speedup becomes 25% in my case.
> 
>  
> 
> Here are the oprofile reports on Exynos 4.
> 
>  
> 
> Vanilla:
> 
> samples  %        linenr info                 image name
> symbol name
> 
> 16436    45.8453  kiss_fft.c:246              libcodec2.so.0.0.0
> kf_work
> 
> 3089      8.6162  s_floor.c:44                libm-2.13.so
> floorl
> 
> 2166      6.0417  nlp.c:209                   libcodec2.so.0.0.0
> nlp
> 
> 1760      4.9092  e_atan2.c:80                libm-2.13.so
> __ieee754_atan2
> 
> 1741      4.8562  fft.c:84                    libcodec2.so.0.0.0
> fft
> 
> 1306      3.6429  s_sin.c:353                 libm-2.13.so
> cosl
> 
> 956       2.6666  (no location information)
> no-vmlinux               /no-vmlinux
> 
> 877       2.4462  sine.c:288                  libcodec2.so.0.0.0
> hs_pitch_refinement
> 
> 691       1.9274  lsp.c:143                   libcodec2.so.0.0.0
> lpc_to_lsp
> 
> 682       1.9023  lpc.c:75                    libcodec2.so.0.0.0
> autocorrelate
> 
> 657       1.8326  phase.c:61                  libcodec2.so.0.0.0
> aks_to_H
> 
> 626       1.7461  quantise.c:479              libcodec2.so.0.0.0
> aks_to_M2
> 
> 624       1.7405  s_sin.c:90                  libm-2.13.so
> sinl
> 
> 449       1.2524  sine.c:395                  libcodec2.so.0.0.0
> est_voicing_mbe
> 
> 326       0.9093  e_log.c:69                  libm-2.13.so
> __ieee754_log
> 
> 322       0.8982  sine.c:564                  libcodec2.so.0.0.0
> synthesise
> 
> 276       0.7699  sine.c:351                  libcodec2.so.0.0.0
> estimate_amplitudes
> 
> 263       0.7336  random.c:293                libc-2.13.so
> random
> 
> 228       0.6360  sine.c:207                  libcodec2.so.0.0.0
> dft_speech
> 
>  
> 
> math-neon:
> 
> samples  %        linenr info                 image name
> symbol name
> 
> 3369     49.2976  kiss_fft.c:246              libcodec2.so.0.0.0
> kf_work
> 
> 438       6.4091  nlp.c:209                   libcodec2.so.0.0.0
> nlp
> 
> 413       6.0433  sine.c:288                  libcodec2.so.0.0.0
> hs_pitch_refinement
> 
> 347       5.0776  fft.c:84                    libcodec2.so.0.0.0
> fft
> 
> 339       4.9605  math_floorf.c:39            libmath_neon.so.0.0.0
> floorf_neon_hfp
> 
> 227       3.3216  (no location information)
> no-vmlinux               /no-vmlinux
> 
> 146       2.1364  lpc.c:78                    libcodec2.so.0.0.0
> autocorrelate
> 
> 140       2.0486  s_sin.c:353                 libm-2.13.so
> cosl
> 
> 133       1.9462  math_floorf.c:54            libmath_neon.so.0.0.0
> floorf_neon_sfp
> 
> 132       1.9315  lsp.c:143                   libcodec2.so.0.0.0
> lpc_to_lsp
> 
> 131       1.9169  quantise.c:479              libcodec2.so.0.0.0
> aks_to_M2
> 
> 121       1.7706  math_sinf.c:73              libmath_neon.so.0.0.0
> sinf_neon_hfp
> 
> 98        1.4340  e_log.c:69                  libm-2.13.so
> __ieee754_log
> 
> 81        1.1853  math_atan2f.c:96            libmath_neon.so.0.0.0
> atan2f_neon_hfp
> 
> 78        1.1414  phase.c:61                  libcodec2.so.0.0.0
> aks_to_H
> 
> 62        0.9072  sine.c:564                  libcodec2.so.0.0.0
> synthesise
> 
> 58        0.8487  phase.c:200                 libcodec2.so.0.0.0
> phase_synth_zero_order
> 
> 43        0.6292  sine.c:206                  libcodec2.so.0.0.0
> dft_speech
> 
> 41        0.5999  random.c:293                libc-2.13.so
> random
> 
>  
> 
> math-neon+libavcodec FFT:
> 
> samples  %        linenr info                 image name
> symbol name
> 
> 665      36.1610  (no location information)
> libavcodec.so.53.7.0     /usr/lib/libavcodec.so.53.7.0
> 
> 225      12.2349  (no location information)
>  no-vmlinux               /no-vmlinux
> 
> 131       7.1234  sine.c:288                  libcodec2.so.0.0.0
> hs_pitch_refinement
> 
> 127       6.9059  nlp.c:209                   libcodec2.so.0.0.0
> nlp
> 
> 103       5.6009  fft.c:183                   libcodec2.so.0.0.0
> fft
> 
> 85        4.6221  math_floorf.c:39            libmath_neon.so.0.0.0
> floorf_neon_hfp
> 
> 42        2.2838  lsp.c:143                   libcodec2.so.0.0.0
> lpc_to_lsp
> 
> 42        2.2838  s_sin.c:353                 libm-2.13.so
>        cosl
> 
> 41        2.2295  math_floorf.c:54            libmath_neon.so.0.0.0
> floorf_neon_sfp
> 
> 39        2.1207  quantise.c:479              libcodec2.so.0.0.0
> aks_to_M2
> 
> 39        2.1207  lpc.c:75                    libcodec2.so.0.0.0
> autocorrelate
> 
> 34        1.8488  math_sinf.c:73              libmath_neon.so.0.0.0
> sinf_neon_hfp
> 
> 22        1.1963  e_log.c:69                  libm-2.13.so
> __ieee754_log
> 
> 22        1.1963  math_atan2f.c:96            libmath_neon.so.0.0.0
> atan2f_neon_hfp
> 
> 18        0.9788  phase.c:200                 libcodec2.so.0.0.0
> phase_synth_zero_order
> 
> 17        0.9244  interp.c:0                  libc-2.13.so
> memcpy
> 
> 16        0.8700  sine.c:206                  libcodec2.so.0.0.0
> dft_speech
> 
> 16        0.8700  math_sinf.c:114             libmath_neon.so.0.0.0
> sinf_neon_sfp
> 
> 15        0.8157  sine.c:564                  libcodec2.so.0.0.0
> synthesise
> 
>  
> 
> The github code was updated.
> 
>  
> 
> I wonder, what if one could profile speex and do the same math-neon
> trick…
> 
>  
> 
> Regards,
> 
> Vadim Markovtsev,
> 
> Engineer, Algorithmic Lab,
> 
> Moscow R&D center, Samsung Electronics
> 
>  
> 
>  
> 
> 
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________ Freetel-codec2 mailing list 
> [email protected] 
> https://lists.sourceforge.net/lists/listinfo/freetel-codec2



------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Freetel-codec2 mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freetel-codec2

Reply via email to