tag 863672 + patch fixed-upstream
thanks

On Mon, 29 May 2017 23:14:38 +0200 Julian Taylor <jtaylor.deb...@googlemail.com> wrote:

>
> libyuv which is a performance critical library for firefix is built with
> -Os which is horrible for performance for it.
> In particular row_common.cc which contains the generic parts of the
> color transformation code:
>
> See:
> https://buildd.debian.org/status/fetch.php?pkg=firefox&arch=amd64&ver=53.0.is.52.0.2-1&stamp=1492644908&raw=0
>
> /usr/bin/g++ -std=gnu++11 -o row_common.o -c ... -fPIC
> -DMOZILLA_CLIENT -include
> /&lt;&lt;PKGBUILDDIR&gt;&gt;/build-browser/mozilla-config.h -MD -MP -MF
> .deps/row_common.o.pp -Wdate-time -D_FORTIFY_SOURCE=2 -Wall
> -Wc++11-compat -Wempty-body -Wignored-qualifiers -Woverloaded-virtual
> -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code
> -Wwrite-strings -Wno-invalid-offsetof -Wc++14-compat
> -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations
> -Wno-error=array-bounds -fno-lifetime-dse -fstack-protector-strong
> -Wformat -Werror=format-security -fno-schedule-insns2 -fno-lifetime-dse
> -fno-delete-null-pointer-checks -fno-exceptions -fno-strict-aliasing
> -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions
> -fno-math-errno -pthread -pipe -g -freorder-blocks -Os
> -fomit-frame-pointer
> /&lt;&lt;PKGBUILDDIR&gt;&gt;/media/libyuv/source/row_common.cc
>
>
> The problematic part is the YuvPixel function which is called in loops
> and in turn calls tiny clamp functions.
> Os disables inlining so this causes massive overhead.
> This is the top cpu profile on sites which e.g. display videos.
> 17.25% libxul.so [.] YuvPixel ▒
> 6.58% libxul.so [.] Clamp ▒
> 6.46% libxul.so [.] clamp255
>
> The problem is not as bad as it looks as this generic code is only
> executed on machines that do not have SSSE3, AVX2 or NEON (see
> convert_argb.cc)
> But there are still plenty useful cpus that do not have these
> instruction sets and are crippled by the compiler flags used.
>
> Is it possible to compile this library with O3 to allow the compiler to
> vectorize it with the best available generic instruction set (e.g. SSE2
> on x64).

FTR, this is fixed upstream now, -O2 is used by default on desktop build:

https://hg.mozilla.org/integration/autoland/rev/8fdb9e30b6a7

Reply via email to