On 03/25/2014 09:50 PM, Jan Hubicka wrote:
Hello,
    I've been compiling Chromium with LTO and I noticed that WPA
stream_out forks and do parallel:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02621.html.

I am unable to fit in 16GB memory: ld uses about 8GB and lto1 about
6GB. When WPA start to fork, memory consumption increases so that
lto1 is killed. I would appreciate an --param option to disable this
WPA fork. The number of forks is taken from build system (-flto=9)
which is fine for ltrans phase, because LD releases aforementioned
8GB.

What do you think about that?
I can take a look - our measurements suggested that the WPA memory will
be later dominated by ltrans.  Perhaps Chromium does something that makes
WPA to explode that would be interesting to analyze.  I did not managed
to get through Chromium LTO build process recently (ninja builds are not
my friends), can you send me the instructions?

Honza
Thanks,
Martin

There are instructions how can one build chromium with LTO:
1) install depot-tools and export PATH variable according to guide: http://www.chromium.org/developers/how-tos/install-depot-tools
2) Checkout source code: gclient sync; cd src
3) Apply patch (enables system gold linker and disables LTO for a sandbox that uses top-level asm)
4) which ld should point to ld.gold
5) unsure that ld.bfd points to ld.bfd
6) run: build/gyp_chromium -Dwerror=
7) ninja -C out/Release chrome -jX

If there are any problems, follow: https://code.google.com/p/chromium/wiki/LinuxBuildInstructions

Martin

diff --git a/libvpx_srcs_x86_64.gypi b/libvpx_srcs_x86_64.gypi
index 91a7ef0..d941bd9 100644
--- a/libvpx_srcs_x86_64.gypi
+++ b/libvpx_srcs_x86_64.gypi
@@ -159,8 +159,6 @@
     '<(libvpx_source)/vp8/encoder/x86/encodeopt.asm',
     '<(libvpx_source)/vp8/encoder/x86/fwalsh_sse2.asm',
     '<(libvpx_source)/vp8/encoder/x86/quantize_mmx.asm',
-    '<(libvpx_source)/vp8/encoder/x86/quantize_sse4.asm',
-    '<(libvpx_source)/vp8/encoder/x86/quantize_ssse3.asm',
     '<(libvpx_source)/vp8/encoder/x86/ssim_opt.asm',
     '<(libvpx_source)/vp8/encoder/x86/subtract_mmx.asm',
     '<(libvpx_source)/vp8/encoder/x86/subtract_sse2.asm',
@@ -302,7 +300,6 @@
     '<(libvpx_source)/vp9/encoder/vp9_writer.h',
     '<(libvpx_source)/vp9/encoder/x86/vp9_error_sse2.asm',
     '<(libvpx_source)/vp9/encoder/x86/vp9_mcomp_x86.h',
-    '<(libvpx_source)/vp9/encoder/x86/vp9_quantize_ssse3.asm',
     '<(libvpx_source)/vp9/encoder/x86/vp9_sad4d_sse2.asm',
     '<(libvpx_source)/vp9/encoder/x86/vp9_sad_mmx.asm',
     '<(libvpx_source)/vp9/encoder/x86/vp9_sad_sse2.asm',
diff --git a/source/libvpx/vp8/vp8cx.mk b/source/libvpx/vp8/vp8cx.mk
index d7c6dd1..ef3ca37 100644
--- a/source/libvpx/vp8/vp8cx.mk
+++ b/source/libvpx/vp8/vp8cx.mk
@@ -87,7 +87,6 @@ VP8_CX_SRCS-$(HAVE_MMX) += encoder/x86/subtract_mmx.asm
 VP8_CX_SRCS-$(HAVE_MMX) += encoder/x86/vp8_enc_stubs_mmx.c
 VP8_CX_SRCS-$(HAVE_SSE2) += encoder/x86/dct_sse2.asm
 VP8_CX_SRCS-$(HAVE_SSE2) += encoder/x86/fwalsh_sse2.asm
-VP8_CX_SRCS-$(HAVE_SSE2) += encoder/x86/quantize_sse2.c
 
 ifeq ($(CONFIG_TEMPORAL_DENOISING),yes)
 VP8_CX_SRCS-$(HAVE_SSE2) += encoder/x86/denoising_sse2.c
@@ -96,8 +95,6 @@ endif
 VP8_CX_SRCS-$(HAVE_SSE2) += encoder/x86/subtract_sse2.asm
 VP8_CX_SRCS-$(HAVE_SSE2) += encoder/x86/temporal_filter_apply_sse2.asm
 VP8_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp8_enc_stubs_sse2.c
-VP8_CX_SRCS-$(HAVE_SSSE3) += encoder/x86/quantize_ssse3.asm
-VP8_CX_SRCS-$(HAVE_SSE4_1) += encoder/x86/quantize_sse4.asm
 VP8_CX_SRCS-$(ARCH_X86)$(ARCH_X86_64) += encoder/x86/quantize_mmx.asm
 VP8_CX_SRCS-$(ARCH_X86)$(ARCH_X86_64) += encoder/x86/encodeopt.asm
 VP8_CX_SRCS-$(ARCH_X86_64) += encoder/x86/ssim_opt.asm

Reply via email to