David Malcolm <dmalc...@redhat.com> a écrit: > On Fri, 2015-09-25 at 11:13 +0200, Dodji Seketeli wrote: [...]
>> Funny; I first overlooked this comment of you, and then when reading the >> patch, I asked myself "why keep the existing location_t" ? I mean, in >> here: >> >> /* A preprocessing token. This has been carefully packed and should >> - occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts. */ >> + occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts. >> + FIXME: the above comment is no longer true with this patch. */ >> struct GTY(()) cpp_token { >> source_location src_loc; /* Location of first char of token. */ >> + source_range src_range; /* Source range covered by the token. >> */ >> ENUM_BITFIELD(cpp_ttype) type : CHAR_BIT; /* token type */ >> unsigned short flags; /* flags - see above */ >> >> You could just change the type of src_loc and make it be a source_range. > > Interesting idea. > > For the general case of expressions, I want a location to mean a > caret/point plus a range that contains it, but for tokens, the > caret/point is always at the start of the range. So maybe a src_range > would do here. > > That said, in patches 3 and 4 of this kit I'm experimenting with > representation; as I said in the blurb for patch 3: "See the > cover-letter for this patch kit which describes how we might go back to > using just a location_t, and stashing the range inside the location_t. > I'm doing it this way for now to allow for more flexibility as I > benchmark and explore implementation options." Right. > So patches 3 and 4 are rather more experimental than patches 1,2 and 5, > as I find out what the different representations do to the time&memory > consumption of the compiler. Understood. > I like the idea of "source_location" and "location_t" becoming a > representation of "(point/caret + range)" rather than just a > point/caret, since this means that we can pass location_t around as > before, but then we can extract ranges from them, and all of the > existing diagnostics ought to "automagically" gain underlines "for > free". Me too. > I'm experimenting with ways to try to do that efficiently, with > strategies for packing things into the 32-bits compactly; see e.g.: > https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html > > If so, then cpp_token's src_loc would remain a source_location; > source_location itself becomes richer. > >> Source range could take a converting constructor, that takes a >> source_location, so that the existing code that does "cpp_token.src_loc >> = a_source_location;" keeps working. >> >> But then, in the previous patch of the series, I see: >> >> +/* A range of source locations. >> + >> + Ranges are closed: >> + m_start is the first location within the range, >> + m_finish is the last location within the range. >> + >> + We may need a more compact way to store these, but for now, >> + let's do it the simple way, as a pair. */ >> +struct GTY(()) source_range >> +{ >> + source_location m_start; >> + source_location m_finish; >> + >> + void debug (const char *msg) const; >> + >> + /* We avoid using constructors, since various structs that >> + don't yet have constructors will embed instances of >> + source_range. */ >> + >> >> But what if we define a default constructor for that one (as well as one >> that takes a source_location)? Wouldn't that work for the embedding >> case that you talk about in that comment? > > Perhaps, but I worry that it would lead to a cascade, where suddenly > we'd need constructors for various other types. I can try it, I > guess. If it leads to a cascade, then don't bother. My point was precisely to try to avoid the churn, while limiting the amount of data size increase for cpp_token in general. As you implied, if we can just stay with a source_location that carries the information of a pointer plus a range, even better. > [BTW, I'm about to disappear on a vacation from tomorrow until October > 6th, and away from computers] Thanks for the heads-up. -- Dodji