Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Richard Biener
On Wed, Sep 24, 2014 at 7:46 AM, Jan Hubicka hubi...@ucw.cz wrote:

 Hi,
 This patch is something I was playing around with assistance of Ian Taylor.
 It seems I need bit more help though :)

 It adds support for direct output of SLIM LTO files to the compiler binary.
 It works as proof of concept, but there are two key parts missing
  1) extension of libiberty's simple file to handle output symbols into COMMON.
 This is needed to output __gnu_lto_v1 and __gnu_lto_slim
 Search for TODO in the patch bellow.
  2) Support in driver to properly execute *1 binary.

 I also disabled outputting ident directive, but I think that one may not be 
 necessary
 because the files are identified by the gnu_lto_v1 symbols. We could add it 
 later.

 Currently the path bypassing asm stage can be tested as follows:

 jan@linux-ujxe:~/trunk/build/gcc cat a.c
 main ()
 {
   printf (Hello world\n);
 }
 jan@linux-ujxe:~/trunk/build/gcc ./xgcc -B ./ -O3 a.c -flto -S 
 -fbypass-asm=crtbegin.o  -o a.o
 jan@linux-ujxe:~/trunk/build/gcc ./xgcc -B ./ -O2 a.o -flto
 jan@linux-ujxe:~/trunk/build/gcc ./a.out
 Hello world

 The implementation is pretty straighforward except for -fbypass-asm requiring
 one existing OBJ file to fetch target's file attributes from.  This is
 definitly not optimal, but libiberty currently can't build output files from
 scratch. As Ian suggested, I plan to simply arrange the driver to pass 
 crtbegin
 around at least to start with. We may want to bypass this later and storing
 proper attributes into the binary.

 Ian, would you be so kind and implement ability to output those two symbols
 into lto-object-simple?  I think we can start with ELF only support.

 The large chunk just moves lto-object around with very small changes in it, 
 so the
 patch is fairly easy.

 I did just quick benchmark with unoptimized cc1 binary compiling the file 
 above.
 For 1000 invocations with bypass I get:

 real0m14.186s
 user0m10.957s
 sys 0m2.424s

 While the default path gets:

 real0m21.913s
 user0m13.856s
 sys 0m5.705s

 With OpenSUSE 13.1 default GCC 4.8.3 build:

 real   0m15.160s
 user   0m8.481s
 sys0m5.159s

 (the difference here is most likely optimizer WRT unoptimized binary, perf 
 shows
 contains_struct_check quite top, so startup overhead still dominates)

 And with clang-3.4:

 real   0m30.097s
 user   0m22.012s
 sys0m6.649s

 That is fairly nice speedup IMO.  With optimized build the difference should
 be more visible because CC1 startup issues will become less important.
 I definitely see ASM file overhead as mesaurable issue with real world 
 benchmarks
 (libreoffice build). Clearly we produce several GBs of object file going 
 through
 crappy and bloated text encoding just for sake of doing it.

Shouldn't -fbypass-asm be simply mangled by the driver?  That is,
the user simply specifies -fbypass-asm and via spec magic the driver
substitutes this with -fbypass-asm=crtbegin.o?  That way at least
the user interface should be stable (as we're supposedly removing
the requirement for that existing object file at some point).

Btw, with early debug info we also need to store dwarf somewhere.
Either we drop the support for fat LTO objects and thus can store
the dwarf alongside the GIMPLE IL and simply link with these
files at the end or we need to support a separate set of files to
store the DWARF.  If we need separate files then why not store
the GIMPLE IL data into separate objects in the first place and
output a reference to it into the main object file?  That way we
don't need any special attributes - the linker plugin simply
opens the main object file, extracts the reference to the IL file
and passes that along.

Btw, the patch is very hard to read as it moves (and modifies?) files
at the same time.  What's this magic file attributes we need?

Thanks,
Richard.

 Honza

 Index: Makefile.in
 ===
 --- Makefile.in (revision 215518)
 +++ Makefile.in (working copy)
 @@ -1300,6 +1300,7 @@
 lto-section-out.o \
 lto-opts.o \
 lto-compress.o \
 +   lto-object.o \
 mcf.o \
 mode-switching.o \
 modulo-sched.o \
 Index: common.opt
 ===
 --- common.opt  (revision 215518)
 +++ common.opt  (working copy)
 @@ -923,6 +923,9 @@
  Common Report Var(flag_btr_bb_exclusive) Optimization
  Restrict target load migration not to re-use registers in any basic block

 +fbypass-asm=
 +Common Joined Var(flag_bypass_asm)
 +
  fcall-saved-
  Common Joined RejectNegative Var(common_deferred_options) Defer
  -fcall-saved-registerMark register as being preserved across 
 functions
 Index: langhooks.c
 ===
 --- langhooks.c (revision 215518)
 +++ langhooks.c (working copy)
 @@ -40,6 +40,10 

Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Ian Lance Taylor
Richard Biener richard.guent...@gmail.com writes:

 Btw, the patch is very hard to read as it moves (and modifies?) files
 at the same time.  What's this magic file attributes we need?

The file attributes issue is the ELF machine number, class, OSABI,
flags, and endianness.  When generating an ELF file it has to have this
information, and it has to match the objects generated by the assembler.
If it doesn't, the linker won't accept it and pass it to the plugin as
we require.  We could of course build a large table of those numbers and
keep it updated for each target.  But it's simpler to extract the
numbers from an existing object file that we know must be valid.

Ian


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Richard Biener
On Wed, Sep 24, 2014 at 2:40 PM, Ian Lance Taylor i...@airs.com wrote:
 Richard Biener richard.guent...@gmail.com writes:

 Btw, the patch is very hard to read as it moves (and modifies?) files
 at the same time.  What's this magic file attributes we need?

 The file attributes issue is the ELF machine number, class, OSABI,
 flags, and endianness.  When generating an ELF file it has to have this
 information, and it has to match the objects generated by the assembler.
 If it doesn't, the linker won't accept it and pass it to the plugin as
 we require.  We could of course build a large table of those numbers and
 keep it updated for each target.  But it's simpler to extract the
 numbers from an existing object file that we know must be valid.

I see.  Thanks for the explanation.

Richard.

 Ian


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Jan Hubicka
 
 Shouldn't -fbypass-asm be simply mangled by the driver?  That is,
 the user simply specifies -fbypass-asm and via spec magic the driver
 substitutes this with -fbypass-asm=crtbegin.o?  That way at least
 the user interface should be stable (as we're supposedly removing
 the requirement for that existing object file at some point).

The idea is to make -fbypass-asm internal and never exposed to user.
That is, default to it with slim LTO unless user asks for assembler
via -S
 
 Btw, with early debug info we also need to store dwarf somewhere.
 Either we drop the support for fat LTO objects and thus can store

I think fat LTO files are useful for LIPO that hopefully will once hit
mainline and for other tricks, so I think we want to keep it.
Hopefully pickling dwarf so two of them can coexists won't be that
hard.

 the dwarf alongside the GIMPLE IL and simply link with these
 files at the end or we need to support a separate set of files to
 store the DWARF.  If we need separate files then why not store
 the GIMPLE IL data into separate objects in the first place and
 output a reference to it into the main object file?  That way we
 don't need any special attributes - the linker plugin simply
 opens the main object file, extracts the reference to the IL file
 and passes that along.

I do not like much the iea of separate files, as make clean will not
be happy.  Having evertyhing in one file seems to make sense.
The attributes are needed to make the file acceptable for the linker/archiver.
 
 Btw, the patch is very hard to read as it moves (and modifies?) files

Basically no modifications there (I believe I did try to set attributes there
and then reverted the change), I will send explicit diff to that file.

 at the same time.  What's this magic file attributes we need?

What type of ELF you produce (32bit/64bit etc.)

Honza
 
 Thanks,
 Richard.
 
  Honza
 
  Index: Makefile.in
  ===
  --- Makefile.in (revision 215518)
  +++ Makefile.in (working copy)
  @@ -1300,6 +1300,7 @@
  lto-section-out.o \
  lto-opts.o \
  lto-compress.o \
  +   lto-object.o \
  mcf.o \
  mode-switching.o \
  modulo-sched.o \
  Index: common.opt
  ===
  --- common.opt  (revision 215518)
  +++ common.opt  (working copy)
  @@ -923,6 +923,9 @@
   Common Report Var(flag_btr_bb_exclusive) Optimization
   Restrict target load migration not to re-use registers in any basic block
 
  +fbypass-asm=
  +Common Joined Var(flag_bypass_asm)
  +
   fcall-saved-
   Common Joined RejectNegative Var(common_deferred_options) Defer
   -fcall-saved-registerMark register as being preserved across 
  functions
  Index: langhooks.c
  ===
  --- langhooks.c (revision 215518)
  +++ langhooks.c (working copy)
  @@ -40,6 +40,10 @@
   #include cgraph.h
   #include timevar.h
   #include output.h
  +#include tree-ssa-alias.h
  +#include gimple-expr.h
  +#include gimple.h
  +#include lto-streamer.h
 
   /* Do nothing; in many cases the default hook.  */
 
  @@ -653,6 +657,19 @@
   {
 section *section;
 
  +  if (flag_bypass_asm)
  +{
  +  static int initialized = false;
  +  if (!initialized)
  +   {
  + gcc_assert (asm_out_file == NULL);
  +  lto_set_current_out_file (lto_obj_file_open (asm_file_name, 
  true));
  + initialized = true;
  +   }
  +  lto_obj_begin_section (name);
  +  return;
  +}
  +
 /* Save the old section so we can restore it in lto_end_asm_section.  */
 gcc_assert (!saved_section);
 saved_section = in_section;
  @@ -669,8 +686,13 @@
  implementation just calls assemble_string.  */
 
   void
  -lhd_append_data (const void *data, size_t len, void *)
  +lhd_append_data (const void *data, size_t len, void *v)
   {
  +  if (flag_bypass_asm)
  +{
  +  lto_obj_append_data (data, len, v);
  +  return;
  +}
 if (data)
   assemble_string ((const char *)data, len);
   }
  @@ -683,6 +705,11 @@
   void
   lhd_end_section (void)
   {
  +  if (flag_bypass_asm)
  +{
  +  lto_obj_end_section ();
  +  return;
  +}
 if (saved_section)
   {
 switch_to_section (saved_section);
  Index: lto/Make-lang.in
  ===
  --- lto/Make-lang.in(revision 215518)
  +++ lto/Make-lang.in(working copy)
  @@ -22,7 +22,7 @@
   # The name of the LTO compiler.
   LTO_EXE = lto1$(exeext)
   # The LTO-specific object files inclued in $(LTO_EXE).
  -LTO_OBJS = lto/lto-lang.o lto/lto.o lto/lto-object.o attribs.o 
  lto/lto-partition.o lto/lto-symtab.o
  +LTO_OBJS = lto/lto-lang.o lto/lto.o attribs.o lto/lto-partition.o 
  lto/lto-symtab.o
   lto_OBJS = $(LTO_OBJS)
 
   # Rules
  Index: lto/lto-object.c
  

Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Andi Kleen
Jan Hubicka hubi...@ucw.cz writes:

Nice patch.

 The implementation is pretty straighforward except for -fbypass-asm requiring
 one existing OBJ file to fetch target's file attributes from.  This is
 definitly not optimal, but libiberty currently can't build output files from
 scratch. As Ian suggested, I plan to simply arrange the driver to pass 
 crtbegin
 around at least to start with. We may want to bypass this later and storing
 proper attributes into the binary.

I wonder how hard it would be to fix simple-object to be able to create
from scratch. From a quick look it would be mostly adding the right
values into the header? That would need some defines per target.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Ian Lance Taylor
On Wed, Sep 24, 2014 at 7:47 AM, Andi Kleen a...@firstfloor.org wrote:

 I wonder how hard it would be to fix simple-object to be able to create
 from scratch. From a quick look it would be mostly adding the right
 values into the header? That would need some defines per target.

It could be done, of course.  It would mean maintaining a new set of
tables and updating them for each target.  The specific table to use
would depend on the command line options.  It turns into yet another
data structure to update.

Ian


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Jan Hubicka
 On Wed, Sep 24, 2014 at 7:47 AM, Andi Kleen a...@firstfloor.org wrote:
 
  I wonder how hard it would be to fix simple-object to be able to create
  from scratch. From a quick look it would be mostly adding the right
  values into the header? That would need some defines per target.
 
 It could be done, of course.  It would mean maintaining a new set of
 tables and updating them for each target.  The specific table to use
 would depend on the command line options.  It turns into yet another
 data structure to update.

Yep, i think the crtstuff hack is pretty good for now (well under assumption
I won't have too hard time to get it working in the driver).  I think the only
real blocker is the lack of simple-object API to create the two common
symbols we need to make the object fiels compliant. I really hope Ian
will help me on this, please;)

Honza
 
 Ian


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Jan Hubicka
  On Wed, Sep 24, 2014 at 7:47 AM, Andi Kleen a...@firstfloor.org wrote:
  
   I wonder how hard it would be to fix simple-object to be able to create
   from scratch. From a quick look it would be mostly adding the right
   values into the header? That would need some defines per target.
  
  It could be done, of course.  It would mean maintaining a new set of
  tables and updating them for each target.  The specific table to use
  would depend on the command line options.  It turns into yet another
  data structure to update.
 
 Yep, i think the crtstuff hack is pretty good for now (well under assumption
 I won't have too hard time to get it working in the driver).  I think the only
 real blocker is the lack of simple-object API to create the two common
 symbols we need to make the object fiels compliant. I really hope Ian
 will help me on this, please;)

Just for some data, I did compile time comparsion at libreoffce
http://hubicka.blogspot.ca/2014/09/linktime-optimization-in-gcc-part-3.html
and firefox
http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html

My general plan is to try to make LTO compile time faster than non-LTO and
possibly clang's on my setup (i.e. with WHOPR parallelism).  It is already
faster than clan'g LTO. Also SPEC build times are now faster than non-LTO
ones. 

Libreoffice shows that GCC needs about twice as much of system time. According
to profiles, good part is the ugly way we pass stuff down to assembler and
other part is memory use during the copmilation stage.
I fixed most of the botlenecks seen in GCC 4.9 - ineffeciencies in hashing
for streaming, unnecesary initialization of the backend, inliner and other
stuff.

Funilly enough I benchmarked LTO build with mainline and GCC 4.9 and the times
are almost exactly the same on both Firefox and libreoffice. There are some
slowdowns too - the speculative devirtualization issues I plan to fix today,
extra streaming needed, and slowdowns in C++ FE/preprocessor...  I will
bechmark last two bit more curefuly ;) But this also means that non-LTO got
slower in 5.0 so I am probably closer to reaching the goal.

Honza
 
 Honza
  
  Ian


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Steven Bosscher
On Wed, Sep 24, 2014 at 6:32 PM, Jan Hubicka hubi...@ucw.cz wrote:
 Libreoffice shows that GCC needs about twice as much of system time. According
 to profiles, good part is the ugly way we pass stuff down to assembler and
 other part is memory use during the copmilation stage.

Are you using -pipe? AFAIR this still isn't the default, even on
GNU/Linux, but it is typically a lot faster than without.

Ciao!
Steven


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Jan Hubicka
 On Wed, Sep 24, 2014 at 6:32 PM, Jan Hubicka hubi...@ucw.cz wrote:
  Libreoffice shows that GCC needs about twice as much of system time. 
  According
  to profiles, good part is the ugly way we pass stuff down to assembler and
  other part is memory use during the copmilation stage.
 
 Are you using -pipe? AFAIR this still isn't the default, even on
 GNU/Linux, but it is typically a lot faster than without.

I use libreoffice's default flags. Will check what they do.
Given that -pipe is around for many years and works well, what about making it 
defualt
to justify GCC 5 release?

honza
 
 Ciao!
 Steven


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Ian Lance Taylor
On Wed, Sep 24, 2014 at 10:04 AM, Steven Bosscher stevenb@gmail.com wrote:
 On Wed, Sep 24, 2014 at 6:32 PM, Jan Hubicka hubi...@ucw.cz wrote:
 Libreoffice shows that GCC needs about twice as much of system time. 
 According
 to profiles, good part is the ugly way we pass stuff down to assembler and
 other part is memory use during the copmilation stage.

 Are you using -pipe? AFAIR this still isn't the default, even on
 GNU/Linux, but it is typically a lot faster than without.

Is that true even when TMPDIR is on a ram disk?  There's no obvious
reason that it should be true in a parallel build.  Using -pipe
effectively constrains communication between the compiler and the
assembler to work in PIPE_BUF blocks.  Using TMPDIR introduces no such
constraints, and in a big program a parallel build should obscure the
fact that the compiler and assembler are serialized for each
individual compilation unit.

Ian


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Jan Hubicka
 On Wed, Sep 24, 2014 at 10:04 AM, Steven Bosscher stevenb@gmail.com 
 wrote:
  On Wed, Sep 24, 2014 at 6:32 PM, Jan Hubicka hubi...@ucw.cz wrote:
  Libreoffice shows that GCC needs about twice as much of system time. 
  According
  to profiles, good part is the ugly way we pass stuff down to assembler and
  other part is memory use during the copmilation stage.
 
  Are you using -pipe? AFAIR this still isn't the default, even on
  GNU/Linux, but it is typically a lot faster than without.
 
 Is that true even when TMPDIR is on a ram disk?  There's no obvious
 reason that it should be true in a parallel build.  Using -pipe
 effectively constrains communication between the compiler and the
 assembler to work in PIPE_BUF blocks.  Using TMPDIR introduces no such
 constraints, and in a big program a parallel build should obscure the
 fact that the compiler and assembler are serialized for each
 individual compilation unit.

Actually I mount /tmp as tmpfs, so this should not be an issue.
Oviously for slim LTO we get more benefits from outputting binary data directly
rather than spending time to printf and scanf them ;)

Honza


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Steven Bosscher
On Wed, Sep 24, 2014 at 11:47 PM, Ian Lance Taylor wrote:
 On Wed, Sep 24, 2014 at 10:04 AM, Steven Bosscher wrote:
 Are you using -pipe? AFAIR this still isn't the default, even on
 GNU/Linux, but it is typically a lot faster than without.

 Is that true even when TMPDIR is on a ram disk?  There's no obvious
 reason that it should be true in a parallel build.  Using -pipe
 effectively constrains communication between the compiler and the
 assembler to work in PIPE_BUF blocks.  Using TMPDIR introduces no such
 constraints, and in a big program a parallel build should obscure the
 fact that the compiler and assembler are serialized for each
 individual compilation unit.

I've done my most recent timings on a machine that has /dev/md3
mounted on /tmp. That's gcc110 on the compile farm. With/without -pipe
made a significant difference.

If TMPDIR is a tmpfs or other kind of ram disk, I suppose the benefits
would be less (to the point of vanishing). Unfortunately I can't test
it...

Ciao!
Steven


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Jan Hubicka
 On Wed, Sep 24, 2014 at 11:47 PM, Ian Lance Taylor wrote:
  On Wed, Sep 24, 2014 at 10:04 AM, Steven Bosscher wrote:
  Are you using -pipe? AFAIR this still isn't the default, even on
  GNU/Linux, but it is typically a lot faster than without.
 
  Is that true even when TMPDIR is on a ram disk?  There's no obvious
  reason that it should be true in a parallel build.  Using -pipe
  effectively constrains communication between the compiler and the
  assembler to work in PIPE_BUF blocks.  Using TMPDIR introduces no such
  constraints, and in a big program a parallel build should obscure the
  fact that the compiler and assembler are serialized for each
  individual compilation unit.
 
 I've done my most recent timings on a machine that has /dev/md3
 mounted on /tmp. That's gcc110 on the compile farm. With/without -pipe
 made a significant difference.
 
 If TMPDIR is a tmpfs or other kind of ram disk, I suppose the benefits
 would be less (to the point of vanishing). Unfortunately I can't test
 it...
OK, I tried it on my hello world benchmark with tmpfs and -fpipe really seems
like a small loss. I wonder if we can work out better defaults that works for
most of people.  I use tmpfs as I am worried about my notebook ssd still
being alive and well in 3 years, but it is still far from mainstream.

Honza


Re: Skipping assembler when producing slim LTO files

2014-09-24 Thread Janne Blomqvist
On Thu, Sep 25, 2014 at 12:47 AM, Ian Lance Taylor i...@google.com wrote:
 Is that true even when TMPDIR is on a ram disk?  There's no obvious
 reason that it should be true in a parallel build.  Using -pipe
 effectively constrains communication between the compiler and the
 assembler to work in PIPE_BUF blocks.  Using TMPDIR introduces no such
 constraints, and in a big program a parallel build should obscure the
 fact that the compiler and assembler are serialized for each
 individual compilation unit.

As an aside, I think what matters is the capacity of the pipe rather
than PIPE_BUF. PIPE_BUF is the largest chunk that can be written
atomically, but since we don't have a case of multiple processes
writing to the same pipe(???), it doesn't matter. On a typical
x86(-64) Linux system, PIPE_BUF is 4k while the capacity is by default
64k (can be increased with fcntl(fd, F_SETPIPE_SZ, ...), perhaps worth
trying to see if it makes any difference?).

Still, it seems to me that making -pipe the default would make sense,
if the tradeoff appears to be a small loss in case when /tmp is a
tmpfs vs. a much larger gain when /tmp is a normal fs.


-- 
Janne Blomqvist


Skipping assembler when producing slim LTO files

2014-09-23 Thread Jan Hubicka

Hi,
This patch is something I was playing around with assistance of Ian Taylor.
It seems I need bit more help though :)

It adds support for direct output of SLIM LTO files to the compiler binary.
It works as proof of concept, but there are two key parts missing
 1) extension of libiberty's simple file to handle output symbols into COMMON.
This is needed to output __gnu_lto_v1 and __gnu_lto_slim
Search for TODO in the patch bellow.
 2) Support in driver to properly execute *1 binary.

I also disabled outputting ident directive, but I think that one may not be 
necessary
because the files are identified by the gnu_lto_v1 symbols. We could add it 
later.

Currently the path bypassing asm stage can be tested as follows:

jan@linux-ujxe:~/trunk/build/gcc cat a.c
main ()
{
  printf (Hello world\n);
}
jan@linux-ujxe:~/trunk/build/gcc ./xgcc -B ./ -O3 a.c -flto -S 
-fbypass-asm=crtbegin.o  -o a.o
jan@linux-ujxe:~/trunk/build/gcc ./xgcc -B ./ -O2 a.o -flto
jan@linux-ujxe:~/trunk/build/gcc ./a.out
Hello world

The implementation is pretty straighforward except for -fbypass-asm requiring
one existing OBJ file to fetch target's file attributes from.  This is
definitly not optimal, but libiberty currently can't build output files from
scratch. As Ian suggested, I plan to simply arrange the driver to pass crtbegin
around at least to start with. We may want to bypass this later and storing
proper attributes into the binary.

Ian, would you be so kind and implement ability to output those two symbols
into lto-object-simple?  I think we can start with ELF only support.

The large chunk just moves lto-object around with very small changes in it, so 
the
patch is fairly easy.

I did just quick benchmark with unoptimized cc1 binary compiling the file above.
For 1000 invocations with bypass I get:

real0m14.186s
user0m10.957s
sys 0m2.424s

While the default path gets:

real0m21.913s
user0m13.856s
sys 0m5.705s

With OpenSUSE 13.1 default GCC 4.8.3 build:

real   0m15.160s
user   0m8.481s
sys0m5.159s

(the difference here is most likely optimizer WRT unoptimized binary, perf shows
contains_struct_check quite top, so startup overhead still dominates)

And with clang-3.4:

real   0m30.097s
user   0m22.012s
sys0m6.649s

That is fairly nice speedup IMO.  With optimized build the difference should
be more visible because CC1 startup issues will become less important.
I definitely see ASM file overhead as mesaurable issue with real world 
benchmarks
(libreoffice build). Clearly we produce several GBs of object file going through
crappy and bloated text encoding just for sake of doing it.

Honza

Index: Makefile.in
===
--- Makefile.in (revision 215518)
+++ Makefile.in (working copy)
@@ -1300,6 +1300,7 @@
lto-section-out.o \
lto-opts.o \
lto-compress.o \
+   lto-object.o \
mcf.o \
mode-switching.o \
modulo-sched.o \
Index: common.opt
===
--- common.opt  (revision 215518)
+++ common.opt  (working copy)
@@ -923,6 +923,9 @@
 Common Report Var(flag_btr_bb_exclusive) Optimization
 Restrict target load migration not to re-use registers in any basic block
 
+fbypass-asm=
+Common Joined Var(flag_bypass_asm)
+
 fcall-saved-
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fcall-saved-registerMark register as being preserved across 
functions
Index: langhooks.c
===
--- langhooks.c (revision 215518)
+++ langhooks.c (working copy)
@@ -40,6 +40,10 @@
 #include cgraph.h
 #include timevar.h
 #include output.h
+#include tree-ssa-alias.h
+#include gimple-expr.h
+#include gimple.h
+#include lto-streamer.h
 
 /* Do nothing; in many cases the default hook.  */
 
@@ -653,6 +657,19 @@
 {
   section *section;
 
+  if (flag_bypass_asm)
+{
+  static int initialized = false;
+  if (!initialized)
+   {
+ gcc_assert (asm_out_file == NULL);
+  lto_set_current_out_file (lto_obj_file_open (asm_file_name, true));
+ initialized = true;
+   }
+  lto_obj_begin_section (name);
+  return;
+}
+
   /* Save the old section so we can restore it in lto_end_asm_section.  */
   gcc_assert (!saved_section);
   saved_section = in_section;
@@ -669,8 +686,13 @@
implementation just calls assemble_string.  */
 
 void
-lhd_append_data (const void *data, size_t len, void *)
+lhd_append_data (const void *data, size_t len, void *v)
 {
+  if (flag_bypass_asm)
+{
+  lto_obj_append_data (data, len, v);
+  return;
+}
   if (data)
 assemble_string ((const char *)data, len);
 }
@@ -683,6 +705,11 @@
 void
 lhd_end_section (void)
 {
+  if (flag_bypass_asm)
+{
+  lto_obj_end_section ();
+  return;
+}
   if (saved_section)