https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83173

--- Comment #2 from Mike Gulick <mgulick at mathworks dot com> ---
I have made some progress in determining the cause of this bug.  This issue
occurs when the current source_location is > LINE_MAP_MAX_LOCATION_WITH_COLS
and when a #include is the last line in the file (with a terminating newline).

The corruption occurs when _cpp_stack_include decrements
ptable->line_table->highest_location.  It does this so that highest_location
refers to the *current* line in the file, not the next line.  For the case
where a #include is *not* the last line in the file, this works correctly. 
However when the the source location is > LINE_MAP_MAX_LOCATION_WITH_COLS and
the current #include line being processed is the last line in the file, the
highest_location value already refers to the current line in the file, as there
is no next line.  Thus this decrement sets highest_location to the previous
line in the file, which causes the corruption.

Consider an include file with two #includes:

  #include "foo.h"
  #include "bar.h"
  EOF

Consider when do_include_common() processes the final '#include "bar.h"'.  This
initially calls parse_include().  This calls check_eol(), which eventually
calls _cpp_lex_direct() via the following call stack:

  0 _cpp_lex_direct
  1 _cpp_lex_token
  2 cpp_get_token_1
  3 cpp_get_token
  4 check_eol_1
  5 check_eol
  6 parse_include
  7 do_include_common
  8 do_include
  9 _cpp_handle_directive
  10 _cpp_lex_token
  11 cpp_get_token_1
  12 cpp_get_token_with_location
  13 scan_translation_unit
  14 preprocess_file

_cpp_lex_direct parses the current buffer one character at a time.  In the case
of the line "#include bar.h", the buffer looks like:

  #include "bar.h"\n\n

Note that the second '\n' is added to the buffer when the file is initially
read in.  It doesn't exist in the file.

After parsing the '#include "bar.h", the buffer is sitting at the first '\n'.

  #include "bar.h"\n\n
                   ^ ^
      buffer.cur---/ |
                     |
      buffer.rlimit--/ 

buffer.rlimit is a pointer to the end of the buffer.  It points to the final
newline that was added to the end of the buffer when the file was read.

_cpp_lex_direct() reads the buffer one character at a time, e.g.

  c = *buffer->cur++
  ...
  switch (c)
    {
    ...
    case '\n':
      if (buffer->cur < buffer->rlimit)
        CPP_INCREMENT_LINE (pfile, 0)
      buffer->need_line = true;
      goto fresh_line;
    ...

Under normal circumstances (i.e. if the #include is *not* the last line in the
file), when the '\n' is detected, CPP_INCREMENT_LINE increments the
line_maps->highest_line.  However for this last #include, buffer->cur ==
buffer->rlimit, so CPP_INCREMENT_LINE is not called.

Thus if the #include token has source_location 1610612807, highest_location in
the line_maps structure also has 1610612807.  Remember that we are past
LINE_MAP_MAX_LOCATION_WITH_COLS, so column numbers are not tracked, thus each
increment of a source_location value refers to a new line number and
potentially a new source file.

Continue stepping through do_include_common to _cpp_stack_include.  This
function has the following comment:

  /* Compensate for the increment in linemap_add that occurs if
      _cpp_stack_file actually stacks the file.  In the case of a
     normal #include, we're currently at the start of the line
     *following* the #include.  A separate source_location for this
     location makes no sense (until we do the LC_LEAVE), and
     complicates LAST_SOURCE_LINE_LOCATION.  This does not apply if we
     found a PCH file (in which case linemap_add is not called) or we
     were included from the command-line.  */

Under normal circumstances, the comment stating "we're currently at the start
of the line *following* the include is correct.  However in this case, this is
not true because we did not increment highest_line, thus highest_location still
refers to the current line.  Thus when we decrement highest_line, this makes
highest_line actually refer to the *previous* line map location, not the
current.  _cpp_stack_file then ultimately calls linemap_add, which sets
start_location to highest_location + 1.  This is assumed to be a new, unused
location, but in this case it actually already refers to an existing line map. 
Note that the linemap_assert in linemap_add will not catch this even if linemap
assertions are enabled.  This is because it only asserts if the new
start_location is less than the source_location of last line in the line map,
however in this case it is equal to the source_location of the last line.

We fix this by no longer decrementing pfile->line_table->highest_location if it
is less than or equal to the source_location of the current include header. 
The purpose of this decrement is to ensure that
pfile->line_table->highest_location still refers to the current line, so if it
already refers to the current line, there is no need to decrement it (and doing
so would be wrong).

Simple approach: This avoids decrementing highest_location when loc >
LINE_MAP_MAX_LOCATION_WITH_COLS:

  if (file->pchname == NULL && file->err_no == 0
      && type != IT_CMDLINE && type != IT_DEFAULT
      && pfile->line_table->highest_location > loc)
    pfile->line_table->highest_location--;

More complicated: I tried to account for the case when loc <=
LINE_MAP_MAX_LOCATION_WITH_COLS.  In this case, CPP_INCREMENT_LINE is still not
called in _cpp_lex_direct when the current #include is the last line in the
file.  So we compute the end of the current location and check if
highest_location is past that.

  if (file->pchname == NULL && file->err_no == 0
      && type != IT_CMDLINE && type != IT_DEFAULT)
    {
      line_map_ordinary * last_ord =
LINEMAPS_LAST_ORDINARY_MAP(pfile->line_table);
      source_location last_map_end = last_ord->start_location +
        ((1 << last_ord->m_column_and_range_bits) - 1);
      if (pfile->line_table->highest_location > last_map_end)
        pfile->line_table->highest_location--;
    }

This seems to work, and I did not see any concerning failures in the existing
test suite.  I have two concerns about this latter approach:

1) I'm not familiar or comfortable with the corner cases of CPP_INCREMENT_LINE
to know if I'm computing the end of the map correctly.
2) I'm using LINEMAPS_LAST_ORDINARY_MAP instead of loc (the source_location of
the include being processed).  It seems like I should be comparing
highest_location to some form of the current loc instead, as the point it to
make sure that highest_location refers to the current source line.

Reply via email to