https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56549
Julien Ruffin changed:
What|Removed |Added
CC||julien.ruffin at ivu dot de
--- Comment #6 from Julien Ruffin ---
I have been having the same issue with GCC 9.2.0 for a while and ended up
finding the cause of this error. It can be traced back to function
_cpp_save_file_entries in gcc/libcpp/files.c.
Short explanation: the function saves the sizes and MD5 checksums of files
without any encoding conversion or BOM removal into the PCH's file list, even
though it should.
Long explanation: the function fills the PCH's files list which contains, among
other things, the sizes and MD5 checksums of all files in the PCH. Later, when
using the PCH, the compiler compares the files it loads with the files in that
list. If it finds an entry with the same size and checksum as the loaded file,
it is in the PCH and the compiler skips processing it: see
check_file_against_entries for the implementation, also in files.c.
The issue here is that the matching never succeeds for headers that contain a
BOM. The PCH entry is always 3 Bytes longer than the file loaded by the
compiler and the checksums always differ. The following code in
_cpp_save_file_entries is why:
if (f->buffer_valid)
md5_buffer ((const char *)f->buffer,
f->st.st_size, result->entries[count].sum);
else
{
FILE *ff;
int oldfd = f->fd;
if (!open_file (f))
{
open_file_failed (pfile, f, 0, 0);
free (result);
return false;
}
ff = fdopen (f->fd, "rb");
md5_stream (ff, result->entries[count].sum);
fclose (ff);
f->fd = oldfd;
}
result->entries[count].size = f->st.st_size;
libcpp caches the contents of the files it reads into their own buffers, here
f->buffer. The read_file function implements this loading and converts the
file's encoding on the fly with _cpp_convert_input. *This conversion strips the
BOM,* so the contents of f->buffer differ from those of the file whenever a BOM
is used.
If f->buffer_valid is not true, which seems to always be the case in the code
above as far as I could test it, the function reopens the file by hand and
computes the MD5 checksum directly from it, without any conversion. open_file()
also overwrites the data size in f->st.st_size with the size of the unconverted
file. That is why the checksum and size of the unconverted file end up in the
PCH's file list.
The compiler later compares those with the files it loads through read_files.
There never is a match because the checksums and sizes differ and the compiler
thinks it it has loaded a different file, so it processes the header with the
BOM a second time and the error we have been observing happens.
I have managed to solve this issue by replacing the manual loading of the
unconverted file in the else block above with a loading through read_file,
yielding the converted buffer and the correct size and, in the end, the correct
checksum. I do not have a patch to offer yet for various reasons but my
amateurish attempt at a fix made me able to build a large C++ code base
successfully with precompiled headers, so it is rather encouraging.
Somebody with more experience in the preprocessor might want to take a look at
this.