On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:

Hi Lewis, thanks for the patch...

> Add a new linemap reason LC_GEN which enables encoding the location of data
> that was generated during compilation and does not appear in any source file.
> There could be many use cases, such as, for instance, referring to the content
> of builtin macros (not yet implemented, but an easy lift after this one.) The
> first intended application is to create a place to store the input to a
> _Pragma directive, so that proper locations can be assigned to those
> tokens. This will be done in a subsequent commit.
> 
> The TO_FILE member of struct line_map_ordinary has been changed to a union
> named SRC which can be either a file name, or a pointer to a line_map_data
> struct describing the data. There is no space overhead added to the line
> maps data structures.
> 
> Outside libcpp, this patch includes only the minimal changes implied by the
> adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
> patches will implement the new functionality.
> 
> libcpp/ChangeLog:
> 
>         * include/line-map.h (enum lc_reason): Add LC_GEN.
>         (struct line_map_data): New struct.
>         (struct line_map_ordinary): Change TO_FILE from a char* to a union,
>         and rename to SRC.
>         (class source_id): New class.
>         (ORDINARY_MAP_GENERATED_DATA_P): New function.
>         (ORDINARY_MAP_GENERATED_DATA): New function.
>         (ORDINARY_MAP_GENERATED_DATA_LEN): New function.
>         (ORDINARY_MAP_SOURCE_ID): New function.
>         (ORDINARY_MAPS_SAME_FILE_P): New function.
>         (ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
>         (LINEMAP_FILE): Adapt to struct line_map_ordinary change.
>         (linemap_get_file_highest_location): Likewise.
>         * line-map.cc (source_id::operator==): New function.
>         (ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
>         (linemap_add): Support creating LC_GEN maps.
>         (linemap_line_start): Support LC_GEN maps.
>         (linemap_check_files_exited): Likewise.
>         (linemap_position_for_loc_and_offset): Likewise.
>         (linemap_get_expansion_filename): Likewise.
>         (linemap_dump): Likewise.
>         (linemap_dump_location): Likewise.
>         (linemap_get_file_highest_location): Likewise.
>         * directives.cc (_cpp_do_file_change): Likewise.
> 
> gcc/c-family/ChangeLog:
> 
>         * c-common.cc (try_to_locate_new_include_insertion_point): Recognize
>         and ignore LC_GEN maps.
> 
> gcc/cp/ChangeLog:
> 
>         * module.cc (module_state::write_ordinary_maps): Recognize and
>         ignore LC_GEN maps, and adapt to interface change in struct
>         line_map_ordinary.
>         (module_state::read_ordinary_maps): Likewise.
> 
> gcc/ChangeLog:
> 
>         * diagnostic-show-locus.cc (compatible_locations_p): Adapt to
>         interface change in struct line_map_ordinary.
>         * input.cc (special_fname_generated): New function.
>         (dump_location_info): Support LC_GEN maps.
>         (get_substring_ranges_for_loc): Adapt to interface change in struct
>         line_map_ordinary.
>         * input.h (special_fname_generated): Declare.
> 
> gcc/go/ChangeLog:
> 
>         * go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
>         LC_GEN maps.
> ---
>  gcc/c-family/c-common.cc     |  11 ++-
>  gcc/cp/module.cc             |   8 +-
>  gcc/diagnostic-show-locus.cc |   2 +-
>  gcc/go/go-linemap.cc         |   3 +-
>  gcc/input.cc                 |  27 +++++-
>  gcc/input.h                  |   1 +
>  libcpp/directives.cc         |   4 +-
>  libcpp/include/line-map.h    | 144 ++++++++++++++++++++++++----
>  libcpp/line-map.cc           | 181 +++++++++++++++++++++++++----------
>  9 files changed, 299 insertions(+), 82 deletions(-)

[...snip...]

> 
> diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
> index 0514815b51f..a2aa6b4e0b5 100644
> --- a/gcc/diagnostic-show-locus.cc
> +++ b/gcc/diagnostic-show-locus.cc
> @@ -998,7 +998,7 @@ compatible_locations_p (location_t loc_a, location_t 
> loc_b)
>          are in the same file.  */
>        const line_map_ordinary *ord_map_a = linemap_check_ordinary (map_a);
>        const line_map_ordinary *ord_map_b = linemap_check_ordinary (map_b);
> -      return ord_map_a->to_file == ord_map_b->to_file;
> +      return ORDINARY_MAPS_SAME_FILE_P (ord_map_a, ord_map_b);

My first thought here was: are buffers supported here, or does it have
to be a file?

It turns out that ORDINARY_MAPS_SAME_FILE_P works on both files and
buffers.

This suggests that it would be better named as
ORDINARY_MAPS_SAME_SOURCE_ID_P, but note the comment below, could this
be:

           return ord_map_a->same_source_id_p (ord_map_b);

?

[...snip...]

> diff --git a/gcc/input.cc b/gcc/input.cc
> index eaf301ec7c1..c1735215b29 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc

[...snip...]

> @@ -1814,11 +1835,11 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
>        /* Bulletproofing.  We ought to only have different ordinary maps
>          for start vs finish due to line-length jumps.  */
>        if (start_ord_map != final_ord_map
> -         && start_ord_map->to_file != final_ord_map->to_file)
> +         && !ORDINARY_MAPS_SAME_FILE_P (start_ord_map, final_ord_map))

For the common case of comparing a pair of ordinary maps that have
filenames, this hunk is replacing pointer comparison with filename_cmp,
which ultimately does something like strcmp.  Should
ORDINARY_MAPS_SAME_FILE_P have a fast-path for pointer equality?


[...snip...]

> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> index 44fea0ea08e..e59123b18c5 100644
> --- a/libcpp/include/line-map.h
> +++ b/libcpp/include/line-map.h

[...snip...]

> @@ -662,6 +716,12 @@ ORDINARY_MAP_IN_SYSTEM_HEADER_P (const line_map_ordinary 
> *ord_map)
>    return ord_map->sysp;
>  }
>  
> +/* TRUE if this line map contains generated data.  */
> +inline bool ORDINARY_MAP_GENERATED_DATA_P (const line_map_ordinary *ord_map)
> +{
> +  return ord_map->reason == LC_GEN;
> +}
> +
>  /* TRUE if this line map is for a module (not a source file).  */
>  
>  inline bool
> @@ -671,14 +731,46 @@ MAP_MODULE_P (const line_map *map)
>           && linemap_check_ordinary (map)->reason == LC_MODULE);
>  }
>  
> -/* Get the filename of ordinary map MAP.  */
> +/* Get the data contents of ordinary map MAP.  */
>  
>  inline const char *
>  ORDINARY_MAP_FILE_NAME (const line_map_ordinary *ord_map)
>  {
> -  return ord_map->to_file;
> +  linemap_assert (ord_map->reason != LC_GEN);
> +  return ord_map->src.file;
> +}
> +
> +inline const char *
> +ORDINARY_MAP_GENERATED_DATA (const line_map_ordinary *ord_map)
> +{
> +  linemap_assert (ord_map->reason == LC_GEN);
> +  return ord_map->src.data->data;
> +}
> +
> +inline unsigned int
> +ORDINARY_MAP_GENERATED_DATA_LEN (const line_map_ordinary *ord_map)
> +{
> +  linemap_assert (ord_map->reason == LC_GEN);
> +  return ord_map->src.data->len;
> +}
> +
> +inline source_id ORDINARY_MAP_SOURCE_ID (const line_map_ordinary *ord_map)
> +{
> +  if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
> +    return source_id {ord_map->src.data->data, ord_map->src.data->len};
> +  return source_id {ord_map->src.file};
> +}
> +
> +/* If we just want to know whether two maps point to the same
> +   file/buffer or not.  */
> +inline bool
> +ORDINARY_MAPS_SAME_FILE_P (const line_map_ordinary *map1,
> +                          const line_map_ordinary *map2)
> +{
> +  return ORDINARY_MAP_SOURCE_ID (map1) == ORDINARY_MAP_SOURCE_ID (map2);
>  }
> 
> 

There are lots of existing BLOCK_CAPS inline functions in line-map.h
due to them originally being macros, but could the new ones above be
member functions of line_map_ordinary?

e.g.

inline const char *
linemap_ordinary::get_generated_data () const
{
  linemap_assert (reason == LC_GEN);
  return src.data->data;
}

Then again, the patch is matching the existing style, so you could save
this for a followup if you like.

> @@ -1093,21 +1185,28 @@ extern location_t linemap_line_start
>  extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
>  
>  /* Add a mapping of logical source line to physical source file and
> -   line number. This function creates an "ordinary map", which is a
> +   line number.  This function creates an "ordinary map", which is a
>     map that records locations of tokens that are not part of macro
>     replacement-lists present at a macro expansion point.
>  
> -   The text pointed to by TO_FILE must have a lifetime
> -   at least as long as the lifetime of SET.  An empty
> -   TO_FILE means standard input.  If reason is LC_LEAVE, and
> -   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> -   natural values considering the file we are returning to.
> +   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
> +   long as the lifetime of SET.  If reason is LC_LEAVE, and 
> FILENAME_OR_BUFFER
> +   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
> +   values considering the file we are returning to.  If reason is LC_GEN, 
> then
> +   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of 
> it.
> +   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
> +
> +   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
> +   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
> +   map.
> +
> +   A call to this function can relocate the previous set of maps, so any 
> stored
> +   line_map pointers should not be used.  */
>  
> -   A call to this function can relocate the previous set of
> -   maps, so any stored line_map pointers should not be used.  */
>  extern const line_map *linemap_add
>    (class line_maps *, enum lc_reason, unsigned int sysp,
> -   const char *to_file, linenum_type to_line);
> +   const char *filename_or_buffer, linenum_type to_line,
> +   unsigned int data_len = 0);

I haven't looked at the rest of the patches yet, but could the params
  const char *filename_or_buffer
and
  unsigned int data_len = 0 

be replaced by:
  source_id src

and, if so, does it simplify things?  Or do the various LC_* cases
complicated things?

>  
>  /* Create a macro map.  A macro map encodes source locations of tokens
>     that are part of a macro replacement-list, at a macro expansion
> @@ -1257,7 +1356,7 @@ linemap_position_for_loc_and_offset (class line_maps 
> *set,
>  inline const char *
>  LINEMAP_FILE (const line_map_ordinary *ord_map)
>  {
> -  return ord_map->to_file;
> +  return ORDINARY_MAP_FILE_NAME (ord_map);
>  }

Presumably this adds the precondition that ORD_MAP isn't an LC_GEN map,
so please update the leading comment.

>  
>  /* Return the line number this map started encoding location from.  */

[...snip...]

> diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
> index e0f82e20571..e63916054e0 100644
> --- a/libcpp/line-map.cc
> +++ b/libcpp/line-map.cc
> @@ -48,6 +48,31 @@ static location_t linemap_macro_loc_to_exp_point 
> (line_maps *,
>  extern unsigned num_expanded_macros_counter;
>  extern unsigned num_macro_tokens_counter;
>  
> +bool
> +source_id::operator== (source_id src) const
> +{
> +  return m_len == src.m_len
> +    && (is_buffer () || !m_filename_or_buffer || !src.m_filename_or_buffer
> +       ? m_filename_or_buffer == src.m_filename_or_buffer
> +       : !filename_cmp (m_filename_or_buffer, src.m_filename_or_buffer));
> +}

This function could really use a leading comment, and I'd much prefer
it if you converted to if statements rather than one big expression.

Am I right in thinking that for filenames, we use libiberty's
filename_cmp (which compares the contents of the buffers), whereas for
buffers we use pointer equality, and we assume that every buffer ptr is
different from every filename's ptr?

As noted above, do we need a fast path for pointer equality before
calling filename_cmp? 


> +
> +/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
> +   but for an LC_GEN map, it returns the file name from which the data
> +   originated, instead of asserting.  */
> +const char *
> +ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
> +                                  const line_map_ordinary *ord_map)
> +{
> +  while (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
> +    {
> +      ord_map = linemap_included_from_linemap (set, ord_map);
> +      if (!ord_map)
> +       return "-";

How does the above early return happen?  Is it the "read from stdin"
case?

> +    }
> +  return ORDINARY_MAP_FILE_NAME (ord_map);
> +}
> +
>  /* Destructor for class line_maps.
>     Ensure non-GC-managed memory is released.  */
>  

[...snip...]

> @@ -505,21 +531,28 @@ LAST_SOURCE_LINE_LOCATION (const line_map_ordinary *map)
>  }
>  
>  /* Add a mapping of logical source line to physical source file and
> -   line number.
> +   line number.  This function creates an "ordinary map", which is a
> +   map that records locations of tokens that are not part of macro
> +   replacement-lists present at a macro expansion point.
> +
> +   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
> +   long as the lifetime of SET.  If reason is LC_LEAVE, and 
> FILENAME_OR_BUFFER
> +   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
> +   values considering the file we are returning to.  If reason is LC_GEN, 
> then
> +   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of 
> it.
> +   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
>  
> -   The text pointed to by TO_FILE must have a lifetime
> -   at least as long as the final call to lookup_line ().  An empty
> -   TO_FILE means standard input.  If reason is LC_LEAVE, and
> -   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> -   natural values considering the file we are returning to.
> +   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
> +   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
> +   map.
>  
> -   FROM_LINE should be monotonic increasing across calls to this
> -   function.  A call to this function can relocate the previous set of
> -   maps, so any stored line_map pointers should not be used.  */
> +   A call to this function can relocate the previous set of maps, so any 
> stored
> +   line_map pointers should not be used.  */
>  
>  const struct line_map *
>  linemap_add (line_maps *set, enum lc_reason reason,
> -            unsigned int sysp, const char *to_file, linenum_type to_line)
> +            unsigned int sysp, const char *filename_or_buffer,
> +            linenum_type to_line, unsigned int data_len)

As noted above, would passing in a source_id make this simpler? 
Looking at the logic below, possibly not...

>  {
>    /* Generate a start_location above the current highest_location.
>       If possible, make the low range bits be zero.  */
> @@ -536,12 +569,24 @@ linemap_add (line_maps *set, enum lc_reason reason,
>  
>    /* When we enter the file for the first time reason cannot be
>       LC_RENAME.  */
> -  linemap_assert (!(set->depth == 0 && reason == LC_RENAME));
> +  line_map_data *data_to_reuse = nullptr;
> +  bool is_data_map = (reason == LC_GEN);
> +  if (reason == LC_RENAME || reason == LC_RENAME_VERBATIM)
> +    {
> +      linemap_assert (set->depth != 0);
> +      const auto prev = LINEMAPS_LAST_ORDINARY_MAP (set);
> +      linemap_assert (prev);
> +      if (prev->reason == LC_GEN)
> +       {
> +         data_to_reuse = prev->src.data;
> +         is_data_map = true;
> +       }
> +    }
>  
>    /* If we are leaving the main file, return a NULL map.  */
>    if (reason == LC_LEAVE
>        && MAIN_FILE_P (LINEMAPS_LAST_ORDINARY_MAP (set))
> -      && to_file == NULL)
> +      && filename_or_buffer == NULL)
>      {
>        set->depth--;
>        return NULL;
> @@ -557,8 +602,9 @@ linemap_add (line_maps *set, enum lc_reason reason,
>      = linemap_check_ordinary (new_linemap (set, start_location));
>    map->reason = reason;
>  
> -  if (to_file && *to_file == '\0' && reason != LC_RENAME_VERBATIM)
> -    to_file = "<stdin>";
> +  if (filename_or_buffer && *filename_or_buffer == '\0'
> +      && reason != LC_RENAME_VERBATIM && !is_data_map)
> +    filename_or_buffer = "<stdin>";
>  
>    if (reason == LC_RENAME_VERBATIM)
>      reason = LC_RENAME;
> @@ -577,21 +623,50 @@ linemap_add (line_maps *set, enum lc_reason reason,
>          that comes right before MAP in the same file.  */
>        from = linemap_included_from_linemap (set, map - 1);
>  
> -      /* A TO_FILE of NULL is special - we use the natural values.  */
> -      if (to_file == NULL)
> +      /* Not currently supporting a #include originating from an LC_GEN
> +        map, since there is no clear use case for this and it would 
> complicate
> +        the logic here.  */
> +      linemap_assert (!ORDINARY_MAP_GENERATED_DATA_P (from));
> +
> +      /* A null FILENAME_OR_BUFFER is special - we use the natural
> +        values.  */
> +      if (!filename_or_buffer)
>         {
> -         to_file = ORDINARY_MAP_FILE_NAME (from);
> +         filename_or_buffer = from->src.file;
>           to_line = SOURCE_LINE (from, from[1].start_location);
>           sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
>         }
>        else
>         linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
> -                                     to_file) == 0);
> +                                     filename_or_buffer) == 0);
>      }
>  
>    map->sysp = sysp;
> -  map->to_file = to_file;
>    map->to_line = to_line;
> +
> +  if (is_data_map)
> +    {
> +      /* All data maps should have reason == LC_GEN, even if they were
> +        an LC_RENAME, to keep it simple to check which maps contain
> +        data.  */
> +      map->reason = LC_GEN;
> +
> +      if (data_to_reuse)
> +       map->src.data = data_to_reuse;
> +      else
> +       {
> +         auto src_data
> +           = (line_map_data *)set->reallocator (nullptr,
> +                                                sizeof (line_map_data));
> +         src_data->data = filename_or_buffer;
> +         src_data->len = data_len;
> +         gcc_assert (data_len);
> +         map->src.data = src_data;
> +       }
> +    }
> +  else
> +    map->src.file = filename_or_buffer;
> +
>    LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
>    /* Do not store range_bits here.  That's readjusted in
>       linemap_line_start.  */
> @@ -606,7 +681,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
>       pure_location_p.  */
>    linemap_assert (pure_location_p (set, start_location));
>  
> -  if (reason == LC_ENTER)
> +  if (reason == LC_ENTER || reason == LC_GEN)
>      {
>        if (set->depth == 0)
>         map->included_from = 0;
> @@ -617,7 +692,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
>               & ~((1 << map[-1].m_column_and_range_bits) - 1))
>              + map[-1].start_location);
>        set->depth++;
> -      if (set->trace_includes)
> +      if (set->trace_includes && reason == LC_ENTER)
>         trace_include (set, map);
>      }
>    else if (reason == LC_RENAME)
> @@ -859,12 +934,16 @@ linemap_line_start (line_maps *set, linenum_type 
> to_line,
>               >= (((uint64_t) 1)
>                   << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
>           || range_bits < map->m_range_bits)
> -       map = linemap_check_ordinary
> -               (const_cast <line_map *>
> -                 (linemap_add (set, LC_RENAME,
> -                               ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
> -                               ORDINARY_MAP_FILE_NAME (map),
> -                               to_line)));
> +       {
> +         const auto maybe_filename = ORDINARY_MAP_GENERATED_DATA_P (map)
> +           ? nullptr : map->src.file;
> +         map = linemap_check_ordinary
> +           (const_cast <line_map *>
> +            (linemap_add (set, LC_RENAME,
> +                          ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
> +                          maybe_filename,
> +                          to_line)));
> +       }
>        map->m_column_and_range_bits = column_bits;
>        map->m_range_bits = range_bits;
>        r = (MAP_START_LOCATION (map)

[...snip...]


Thanks again for the patch; hope this is constructive
Dave

Reply via email to