On Tue, Oct 6, 2015 at 9:26 AM, Rong Xu <x...@google.com> wrote: > On Mon, Oct 5, 2015 at 5:33 PM, Xinliang David Li <davi...@google.com> wrote: >> unsigned ggc_memory = gcov_read_unsigned (); >> + unsigned marker = 0, len = 0, k; >> + char **string_array, *saved_cc1_strings; >> + >> for (unsigned j = 0; j < 7; j++) >> >> >> Do not use hard coded number. Use the enum defined in coverage.c. > > OK. > >> >> >> + string_array[j] = xstrdup (gcov_read_string ()); >> + >> + k = 0; >> + for (unsigned j = 1; j < 7; j++) >> >> Do not use hard coded number. > > OK. > >> >> >> + { >> + if (num_array[j] == 0) >> + continue; >> + marker += num_array[j]; >> >> It is better to read if the name of variable 'marker' is changed to >> 'j_end' or something similar >> >> For all the substrings of 'j' kind, there should be just one marker, >> right? It looks like here you introduce one marker per string, not one >> marker per string kind. > > I don't understand what you meant here. "marker" is fixed for each j > substring (one option kind) -- it the end index of the sub-string > array. k-loop is for each string. >
That was a wrong comment from me. Discard it. >> >> + len += 3; /* [[<FLAG> */ >> >> Same here for hard coded value. >> >> + for (; k < marker; k++) >> + len += strlen (string_array[k]) + 1; /* 1 for delimter of ']' >> */ >> >> Why do we need one ']' per string? > > This is because the options strings can contain space ' '. I cannot > use space as the delimiter, neither is \0 as it is the end of the > string of the encoded string. Ok -- this allows you to avoid string copy during parsing. > >> >> >> + } >> + saved_cc1_strings = (char *) xmalloc (len + 1); >> + saved_cc1_strings[0] = 0; >> + >> + marker = 0; >> + k = 0; >> + for (unsigned j = 1; j < 7; j++) >> >> Same here for 7. > > will fix in the new patch. > >> >> + { >> + static const char lipo_string_flags[6] = {'Q', 'B', 'S', >> 'D','I', 'C'}; >> + if (num_array[j] == 0) >> + continue; >> + marker += num_array[j]; >> >> Suggest changing marker to j_end > OK. > >> >> + sprintf (saved_cc1_strings, "%s[[%c", saved_cc1_strings, >> + lipo_string_flags[j - 1]); >> + for (; k < marker; k++) >> + { >> + sprintf (saved_cc1_strings, "%s%s]", saved_cc1_strings, >> + string_array[k]); >> >> +#define DELIMTER "[[" >> >> Why double '[' ? > I will change to single '['. > >> >> +#define DELIMTER2 "]" >> +#define QUOTE_PATH_FLAG 'Q' >> +#define BRACKET_PATH_FLAG 'B' >> +#define SYSTEM_PATH_FLAG 'S' >> +#define D_U_OPTION_FLAG 'D' >> +#define INCLUDE_OPTION_FLAG 'I' >> +#define COMMAND_ARG_FLAG 'C' >> + >> +enum lipo_cc1_string_kind { >> + k_quote_paths = 0, >> + k_bracket_paths, >> + k_system_paths, >> + k_cpp_defines, >> + k_cpp_includes, >> + k_lipo_cl_args, >> + num_lipo_cc1_string_kind >> +}; >> + >> +struct lipo_parsed_cc1_string { >> + const char* source_filename; >> + unsigned num[num_lipo_cc1_string_kind]; >> + char **strings[num_lipo_cc1_string_kind]; >> +}; >> + >> +struct lipo_parsed_cc1_string * >> +lipo_parse_saved_cc1_string (const char *src, char *str, >> + bool parse_cl_args_only); >> +void free_parsed_string (struct lipo_parsed_cc1_string *string); >> + >> >> Declare above in a header file. > > OK. > >> >> >> /* Returns true if the command-line arguments stored in the given >> module-infos >> are incompatible. */ >> bool >> -incompatible_cl_args (struct gcov_module_info* mod_info1, >> - struct gcov_module_info* mod_info2) >> +incompatible_cl_args (struct lipo_parsed_cc1_string* mod_info1, >> + struct lipo_parsed_cc1_string* mod_info2) >> >> Fix formating. > OK. >> >> { >> { >> @@ -1647,7 +1679,7 @@ build_var (tree fn_decl, tree type, int counter) >> /* Creates the gcov_fn_info RECORD_TYPE. */ >> >> NULL_TREE, get_gcov_unsigned_t ()); >> DECL_CHAIN (field) = fields; >> fields = field; >> >> - /* Num bracket paths */ >> + /* cc1_uncompressed_strlen field */ >> field = build_decl (BUILTINS_LOCATION, FIELD_DECL, >> NULL_TREE, get_gcov_unsigned_t ()); >> DECL_CHAIN (field) = fields; >> fields = field; >> >> >> Why do we need to store uncompressed string length? If there is need >> to do that, I suggest combine uncompressed length and compressed >> length into one 32bit integer. (16/16, or 17/15 split) > > In theory, I don't need the uncompressed length, But I would need to > guess the uncompressed length to allocate the buffer. If the > decompressing fails with insufficient space, I need to have another > guess and do it again. I think it's easier just save the uncompressed > size to the struct. I think combine the two sizes into one > gcov_unsigned_t is a good idea. We don't expect the string size are > very big. What is the usual compression ratio? If you guess it right most of the time, just get rid of the uncompressed length. David >> >> >> David >> >> On Mon, Oct 5, 2015 at 3:51 PM, Rong Xu <x...@google.com> wrote: >>> Hi, >>> >>> This patch is for google branch only. >>> >>> It encodes and compresses various cc1 option strings in >>> gcov_module_info to reduce the lipo instrumented object size. The >>> savings are from string compression and the reduced number of >>> relocations. >>> >>> More specifically, we replace the following fields in gcov_module_info >>> gcov_unsigned_t num_quote_paths; >>> gcov_unsigned_t num_bracket_paths; >>> gcov_unsigned_t num_system_paths; >>> gcov_unsigned_t num_cpp_defines; >>> gcov_unsigned_t num_cpp_includes; >>> gcov_unsigned_t num_cl_args; >>> char *string_array[1]; >>> with >>> gcov_unsigned_t cc1_strlen; >>> gcov_unsigned_t cc1_uncompressed_strlen; >>> char *saved_cc1_strings; >>> >>> The new saved_cc1_strings are zlib compressed string. >>> >>> Tested with google internal benchmarks. >>> >>> Thanks, >>> >>> -Rong