Re: Random cleanups [4/4]: Streamlining streamer

2011-03-31 Thread Richard Guenther
On Thu, Mar 31, 2011 at 4:07 AM, Michael Matz m...@suse.de wrote:
 Hi,

 I fear I wasn't as thorough in also splitting this one into several
 patches, but the different cleanups are at least mostly in different
 files.  They are:

 * lto-lang remembers all builtin decls in a local list, to be returned
  by the getdecls langhook.  But as we have our own write_globals langhook
  this isn't actually called (except by dbxout.c), so there's no point in
  remembering.

 * lto.c:lto_materialize_function has code to read in the function body
  sections, do something with them in non-wpa mode, and discard them then.
  There's no point in even reading them in in non-wpa mode (except for a
  dubious error message that rather is worth an assert).

 * gimple.c:gimple_type_leader_entry is a fixed length cache for speeding
  up our type merging machinery.  It can hold references to many meanwhile
  merged trees, interferring with the wish of free up much memory with a
  ggc_collect with early-merging LTO.  We can simply make it deletable.

 * ipa-inline.c: some tidying in not calling a macro with function call
  arguments, and calling a costly function only after early-outs.

 * lto-streamer-out.c : it writes out and compares strings character by
  character.  memcmp and output_data_stream work just as well

 * lto-streamer: output_unreferenced_globals writes out all global varpool
  decls.  The reading side simply reads over all of them, and ignores
  them.  This was supposed to help symbol resolution, and it probably once
  did.  But since some time we properly emit varpool and cgraph nodes, and
  references between them, and a proper symtab.  There's no need for
  emitting these trees again.

 * lto-streamer: the following changes the bytecode:
  1: all indices  into the cache are unsigned, hence we should say
     so, instead of casting casts back and forth
  2: trees are only appended to the cache, when writing out.  When reading
     in we read in all trees in the stream one after the other, also
     appending to the cache.  References to existing trees _always_ are to
     - well - existing trees, hence to those already emitted earlier in
     the stream, i.e. with a smaller offset, and more importantly with a
     known index even at reader side.

     So, the offset never is used, so remove that and all associated
     tracking and params.
  3: for the same reason we also don't need to stream the index that new
     trees get in the cache.  They will get exactly the ones they also had
     when writing out.  We could use it as consistency check, but we
     stream the expected tree-node for back-references for that already.

     Obviously we do need to stream the index in back references (aka
     pickled references).

     (the index could change if there's a different set of nodes preloaded
     into the cache between writing out and reading in.  But that would
     have much worse problems already, silently overwriting slots with
     trees from the stream; we should do away with the preloaded nodes,
     and instead rely on type merging to get canonical versions of the
     builtin trees)

 Not streaming offset and index for most trees obviously shortens the
 bytecode somewhat but I don't have statistics on how much.  Not much would
 be my guess.

 Regstrapped on x86_64-linux with the other three cleanups.  Okay for
 trunk?

I don't see a need (in this patch) to move lto_input_chain earlier, but
I suppose it doesn't matter and is needed by a followup.

Ok.

Thanks,
Richard.


Random cleanups [4/4]: Streamlining streamer

2011-03-30 Thread Michael Matz
Hi,

I fear I wasn't as thorough in also splitting this one into several 
patches, but the different cleanups are at least mostly in different 
files.  They are:

* lto-lang remembers all builtin decls in a local list, to be returned
  by the getdecls langhook.  But as we have our own write_globals langhook
  this isn't actually called (except by dbxout.c), so there's no point in 
  remembering.

* lto.c:lto_materialize_function has code to read in the function body 
  sections, do something with them in non-wpa mode, and discard them then.
  There's no point in even reading them in in non-wpa mode (except for a 
  dubious error message that rather is worth an assert).

* gimple.c:gimple_type_leader_entry is a fixed length cache for speeding 
  up our type merging machinery.  It can hold references to many meanwhile 
  merged trees, interferring with the wish of free up much memory with a 
  ggc_collect with early-merging LTO.  We can simply make it deletable.

* ipa-inline.c: some tidying in not calling a macro with function call 
  arguments, and calling a costly function only after early-outs.

* lto-streamer-out.c : it writes out and compares strings character by 
  character.  memcmp and output_data_stream work just as well

* lto-streamer: output_unreferenced_globals writes out all global varpool 
  decls.  The reading side simply reads over all of them, and ignores 
  them.  This was supposed to help symbol resolution, and it probably once 
  did.  But since some time we properly emit varpool and cgraph nodes, and 
  references between them, and a proper symtab.  There's no need for 
  emitting these trees again.

* lto-streamer: the following changes the bytecode:
  1: all indices  into the cache are unsigned, hence we should say 
 so, instead of casting casts back and forth
  2: trees are only appended to the cache, when writing out.  When reading
 in we read in all trees in the stream one after the other, also 
 appending to the cache.  References to existing trees _always_ are to 
 - well - existing trees, hence to those already emitted earlier in 
 the stream, i.e. with a smaller offset, and more importantly with a 
 known index even at reader side.

 So, the offset never is used, so remove that and all associated 
 tracking and params.
  3: for the same reason we also don't need to stream the index that new
 trees get in the cache.  They will get exactly the ones they also had 
 when writing out.  We could use it as consistency check, but we 
 stream the expected tree-node for back-references for that already.

 Obviously we do need to stream the index in back references (aka 
 pickled references).

 (the index could change if there's a different set of nodes preloaded 
 into the cache between writing out and reading in.  But that would 
 have much worse problems already, silently overwriting slots with 
 trees from the stream; we should do away with the preloaded nodes,
 and instead rely on type merging to get canonical versions of the 
 builtin trees)

Not streaming offset and index for most trees obviously shortens the 
bytecode somewhat but I don't have statistics on how much.  Not much would 
be my guess.

Regstrapped on x86_64-linux with the other three cleanups.  Okay for 
trunk?


Ciao,
Michael.
-- 
* lto-streamer.h (struct lto_streamer_cache_d): Remove offsets
and next_slot members.
(lto_streamer_cache_insert, lto_streamer_cache_insert_at,
lto_streamer_cache_lookup, lto_streamer_cache_get): Adjust prototypes.
(lto_streamer_cache_append): Declare.
* lto-streamer.c (lto_streamer_cache_add_to_node_array): Use
unsigned index, remove offset parameter, ensure that we append
or update existing entries.
(lto_streamer_cache_insert_1): Use unsigned index, remove offset_p
parameter, update next_slot for append.
(lto_streamer_cache_insert): Use unsigned index, remove offset_p
parameter.
(lto_streamer_cache_insert_at): Likewise.
(lto_streamer_cache_append): New function.
(lto_streamer_cache_lookup): Use unsigned index.
(lto_streamer_cache_get): Likewise.
(lto_record_common_node): Don't test tree_node_can_be_shared.
(preload_common_node): Adjust call to lto_streamer_cache_insert.
(lto_streamer_cache_delete): Don't free offsets member.
* lto-streamer-out.c (eq_string_slot_node): Use memcmp.
(lto_output_string_with_length): Use lto_output_data_stream.
(lto_output_tree_header): Remove ix parameter, don't write it.
(lto_output_builtin_tree): Likewise.
(lto_write_tree): Adjust callers to above, don't track and write
offset, write unsigned index.
(output_unreferenced_globals): Don't emit all global vars.
(write_global_references): Use unsigned indices.
(lto_output_decl_state_refs): Likewise.