https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83173
--- Comment #2 from Mike Gulick <mgulick at mathworks dot com> --- I have made some progress in determining the cause of this bug. This issue occurs when the current source_location is > LINE_MAP_MAX_LOCATION_WITH_COLS and when a #include is the last line in the file (with a terminating newline). The corruption occurs when _cpp_stack_include decrements ptable->line_table->highest_location. It does this so that highest_location refers to the *current* line in the file, not the next line. For the case where a #include is *not* the last line in the file, this works correctly. However when the the source location is > LINE_MAP_MAX_LOCATION_WITH_COLS and the current #include line being processed is the last line in the file, the highest_location value already refers to the current line in the file, as there is no next line. Thus this decrement sets highest_location to the previous line in the file, which causes the corruption. Consider an include file with two #includes: #include "foo.h" #include "bar.h" EOF Consider when do_include_common() processes the final '#include "bar.h"'. This initially calls parse_include(). This calls check_eol(), which eventually calls _cpp_lex_direct() via the following call stack: 0 _cpp_lex_direct 1 _cpp_lex_token 2 cpp_get_token_1 3 cpp_get_token 4 check_eol_1 5 check_eol 6 parse_include 7 do_include_common 8 do_include 9 _cpp_handle_directive 10 _cpp_lex_token 11 cpp_get_token_1 12 cpp_get_token_with_location 13 scan_translation_unit 14 preprocess_file _cpp_lex_direct parses the current buffer one character at a time. In the case of the line "#include bar.h", the buffer looks like: #include "bar.h"\n\n Note that the second '\n' is added to the buffer when the file is initially read in. It doesn't exist in the file. After parsing the '#include "bar.h", the buffer is sitting at the first '\n'. #include "bar.h"\n\n ^ ^ buffer.cur---/ | | buffer.rlimit--/ buffer.rlimit is a pointer to the end of the buffer. It points to the final newline that was added to the end of the buffer when the file was read. _cpp_lex_direct() reads the buffer one character at a time, e.g. c = *buffer->cur++ ... switch (c) { ... case '\n': if (buffer->cur < buffer->rlimit) CPP_INCREMENT_LINE (pfile, 0) buffer->need_line = true; goto fresh_line; ... Under normal circumstances (i.e. if the #include is *not* the last line in the file), when the '\n' is detected, CPP_INCREMENT_LINE increments the line_maps->highest_line. However for this last #include, buffer->cur == buffer->rlimit, so CPP_INCREMENT_LINE is not called. Thus if the #include token has source_location 1610612807, highest_location in the line_maps structure also has 1610612807. Remember that we are past LINE_MAP_MAX_LOCATION_WITH_COLS, so column numbers are not tracked, thus each increment of a source_location value refers to a new line number and potentially a new source file. Continue stepping through do_include_common to _cpp_stack_include. This function has the following comment: /* Compensate for the increment in linemap_add that occurs if _cpp_stack_file actually stacks the file. In the case of a normal #include, we're currently at the start of the line *following* the #include. A separate source_location for this location makes no sense (until we do the LC_LEAVE), and complicates LAST_SOURCE_LINE_LOCATION. This does not apply if we found a PCH file (in which case linemap_add is not called) or we were included from the command-line. */ Under normal circumstances, the comment stating "we're currently at the start of the line *following* the include is correct. However in this case, this is not true because we did not increment highest_line, thus highest_location still refers to the current line. Thus when we decrement highest_line, this makes highest_line actually refer to the *previous* line map location, not the current. _cpp_stack_file then ultimately calls linemap_add, which sets start_location to highest_location + 1. This is assumed to be a new, unused location, but in this case it actually already refers to an existing line map. Note that the linemap_assert in linemap_add will not catch this even if linemap assertions are enabled. This is because it only asserts if the new start_location is less than the source_location of last line in the line map, however in this case it is equal to the source_location of the last line. We fix this by no longer decrementing pfile->line_table->highest_location if it is less than or equal to the source_location of the current include header. The purpose of this decrement is to ensure that pfile->line_table->highest_location still refers to the current line, so if it already refers to the current line, there is no need to decrement it (and doing so would be wrong). Simple approach: This avoids decrementing highest_location when loc > LINE_MAP_MAX_LOCATION_WITH_COLS: if (file->pchname == NULL && file->err_no == 0 && type != IT_CMDLINE && type != IT_DEFAULT && pfile->line_table->highest_location > loc) pfile->line_table->highest_location--; More complicated: I tried to account for the case when loc <= LINE_MAP_MAX_LOCATION_WITH_COLS. In this case, CPP_INCREMENT_LINE is still not called in _cpp_lex_direct when the current #include is the last line in the file. So we compute the end of the current location and check if highest_location is past that. if (file->pchname == NULL && file->err_no == 0 && type != IT_CMDLINE && type != IT_DEFAULT) { line_map_ordinary * last_ord = LINEMAPS_LAST_ORDINARY_MAP(pfile->line_table); source_location last_map_end = last_ord->start_location + ((1 << last_ord->m_column_and_range_bits) - 1); if (pfile->line_table->highest_location > last_map_end) pfile->line_table->highest_location--; } This seems to work, and I did not see any concerning failures in the existing test suite. I have two concerns about this latter approach: 1) I'm not familiar or comfortable with the corner cases of CPP_INCREMENT_LINE to know if I'm computing the end of the map correctly. 2) I'm using LINEMAPS_LAST_ORDINARY_MAP instead of loc (the source_location of the include being processed). It seems like I should be comparing highest_location to some form of the current loc instead, as the point it to make sure that highest_location refers to the current source line.