Re: Better GCC diagnostics
My point is that if the rest of the compiler extends diagnostics.c to handle caret and ranges and all other features, Ada would need carry all that around even if gnat has its own diagnostics machinery. But perhaps that is OK. Certainly OK, what possible (noticeable) issue would you foresee ? I am not trying to force anything just make you aware of the situation. Well, it's not like there's a choice anyway. As Robert and myself explained, GNAT's machinery is much more powerful in terms of diagnostic compared to libcpp (or even compared to what's planned), and this machinery is used in other non GCC contexts anyway, so has its own life cycle and raison d'être. I see that as very unrealistic, GCC development is already complex enough, so we will have to live with the duplication. That was of course a semi joke :-) Only semi because it would make a lot of sense to write a code generator from scratch in Ada, but of course it does not make much sense to rewrite GCC in Ada. I just wanted to make you notice that even if location_t is moved out of libcpp, GNAT will still link with diagnostic.c and whatever other diagnostics features that will be implemented. The burden of the duplication would be on GNAT executable, the other front-ends do not carry GNAT's code. Probably the effect of it is negligible, anyway. Right, effect is clearly negligible, if noticeable at all. Again, my point is that even if GNAT has all these features already, GNAT should be interested in this since it means moving stuff out of libcpp which will allow to break that dependency. C/C++ could very well implement caret diagnostics and everything else within libcpp, so the dependency of GNAT with libcpp would be actually harder to break. My fear is that if the other FEs do not care about this issue because we have it already, then this is exactly what will happen. Well, if other GCC developers do not see the obvious advantages in splitting libcpp in several logical pieces, then so be it, what can I say. Arno
Re: Better GCC diagnostics
Well, clearly, the preprocessor and handling of #include is very C/C++ specific, and not used by Ada or Fortran. Both Ada and Fortran are linked with libcpp.a. Sure, because as we've said, libcpp is not modular enough. Clearly Ada does not need to link with A C preprocessor at all. Even if you do not use line-map.o, the middle-end does, so you still have the duplication. Right, this is the only part that is indeed shared and for which Ada requires libcpp. It has also handled column info from day one (and we would not want any of the e.g. -fshow-column stuff: why would you *not* want to display column information ? :-) Yes, why? So, I assume there are plans to get rid of -fshow-column switch (and make it the only default). The point is that other front-ends would be able to use it. And non-Ada developers would be able to fix it and extend it. Moreover, I certainly have no problem giving access to other front-ends to GNAT's capability here, as long as everyone is cool with using Ada in other front-ends, I'm certainly all for it. even if GNAT won't use it, moving all this out of libcpp and making it modular would allow to link only to the things that you really need. Definitely. Finally, the middle-end is using it anyway, so gnat is using it. Sure. Using line-map is one thing, using a C preprocessor is another. Arno
Re: Better GCC diagnostics
Even if you do not use line-map.o, the middle-end does, so you still have the duplication. Right, this is the only part that is indeed shared and for which Ada requires libcpp. There are exactly 6 uses of linemap functions in libcpp. They could be replaced with a handful of function pointers, moving line-map.c to gcc. There are 2 more uses in makedepend.c, which could define its own dummy functions because it does not care about locations at all. Paolo
Re: Better GCC diagnostics
2008/8/25 Arnaud Charlet [EMAIL PROTECTED]: Even if you do not use line-map.o, the middle-end does, so you still have the duplication. Right, this is the only part that is indeed shared and for which Ada requires libcpp. My point is that if the rest of the compiler extends diagnostics.c to handle caret and ranges and all other features, Ada would need carry all that around even if gnat has its own diagnostics machinery. But perhaps that is OK. I am not trying to force anything just make you aware of the situation. It has also handled column info from day one (and we would not want any of the e.g. -fshow-column stuff: why would you *not* want to display column information ? :-) Yes, why? So, I assume there are plans to get rid of -fshow-column switch (and make it the only default). No, I cannot decide such thing. I would be happy to write such patch, though. The point is that other front-ends would be able to use it. And non-Ada developers would be able to fix it and extend it. Moreover, I certainly have no problem giving access to other front-ends to GNAT's capability here, as long as everyone is cool with using Ada in other front-ends, I'm certainly all for it. I see that as very unrealistic, GCC development is already complex enough, so we will have to live with the duplication. I just wanted to make you notice that even if location_t is moved out of libcpp, GNAT will still link with diagnostic.c and whatever other diagnostics features that will be implemented. The burden of the duplication would be on GNAT executable, the other front-ends do not carry GNAT's code. Probably the effect of it is negligible, anyway. Finally, the middle-end is using it anyway, so gnat is using it. Sure. Using line-map is one thing, using a C preprocessor is another. Again, my point is that even if GNAT has all these features already, GNAT should be interested in this since it means moving stuff out of libcpp which will allow to break that dependency. C/C++ could very well implement caret diagnostics and everything else within libcpp, so the dependency of GNAT with libcpp would be actually harder to break. My fear is that if the other FEs do not care about this issue because we have it already, then this is exactly what will happen. Cheers, Manuel.
Re: Better GCC diagnostics
2008/8/25 Paolo Bonzini [EMAIL PROTECTED]: Even if you do not use line-map.o, the middle-end does, so you still have the duplication. Right, this is the only part that is indeed shared and for which Ada requires libcpp. There are exactly 6 uses of linemap functions in libcpp. They could be replaced with a handful of function pointers, moving line-map.c to gcc. You have to count all uses of: SOURCE_LINE, SOURCE_COLUMN, MAIN_FILE_P, LAST_SOURCE_LINE, LAST_SOURCE_LINE_LOCATION, and LINEMAP_POSITION_FOR_COLUMN which are all macros. Also, line-map functions seem to be in the critical path, so the extra dereferencing may be noticeable. There are 2 more uses in makedepend.c, which could define its own dummy functions because it does not care about locations at all. Sure about that? line-map.h seems pretty important to handle the include stack. Cheers, Manuel.
Re: Better GCC diagnostics
Manuel López-Ibáñez wrote: 2008/8/25 Paolo Bonzini [EMAIL PROTECTED]: Even if you do not use line-map.o, the middle-end does, so you still have the duplication. Right, this is the only part that is indeed shared and for which Ada requires libcpp. There are exactly 6 uses of linemap functions in libcpp. They could be replaced with a handful of function pointers, moving line-map.c to gcc. You have to count all uses of: SOURCE_LINE, SOURCE_COLUMN, MAIN_FILE_P, LAST_SOURCE_LINE, LAST_SOURCE_LINE_LOCATION, and LINEMAP_POSITION_FOR_COLUMN which are all macros. Yes, but libcpp could still provide the *interface* of line-map.h (the macros), and leave the actual implementation of the data structure to the client. Also, line-map functions seem to be in the critical path, so the extra dereferencing may be noticeable. That's true. Unless we lexed everything up front, in which case one could hope that the indirect branches would be predicted well. In fact, I'm curious about how good/bad the branch predictor works for GCC. My wild guess: awfully. Sure about that? line-map.h seems pretty important to handle the include stack. As far as I could see, all the dependency stuff is in mkdeps.c; maybe it could use the line-map infrastructure more. Of course, in that case moving line-map.c out of libcpp would mean moving mkdeps.c too -- and not in gcc/, but rather in its own library so that makedepend can use it... at which point you wonder if it was really a good idea... Paolo
Re: Better GCC diagnostics
2008/8/25 Paolo Bonzini [EMAIL PROTECTED]: As far as I could see, all the dependency stuff is in mkdeps.c; maybe it could use the line-map infrastructure more. Of course, in that case moving line-map.c out of libcpp would mean moving mkdeps.c too -- and not in gcc/, but rather in its own library so that makedepend can use it... at which point you wonder if it was really a good idea... Well, my opinion is that line-map.c should be in its own library/object file independent of libcpp but libcpp (and makedepend and cc1 and gnat) should still be able to use it, preferably directly and not through pointers. Not sure how to do this but I think it is what we do with libiberty, don't we? Perhaps there is an even easier way. The same should happen with diagnostic.[ch]. Cheers, Manuel.
Re: Better GCC diagnostics
Manuel López-Ibáñez wrote: Again, my point is that even if GNAT has all these features already, GNAT should be interested in this since it means moving stuff out of libcpp which will allow to break that dependency. C/C++ could very well implement caret diagnostics and everything else within libcpp, so the dependency of GNAT with libcpp would be actually harder to break. My fear is that if the other FEs do not care about this issue because we have it already, then this is exactly what will happen. Fortran is probably an example of don't care: Fortran already uses caret diagnostics, but it depends also on libcpp as several Fortran programs require CPP preprocessing. Tobias
Re: Better GCC diagnostics
2008/8/20 Arnaud Charlet [EMAIL PROTECTED]: Not just that, probably Fortran/Ada are already duplicating stuff that is in libcpp or they are implementing their own version of stuff that C/C++ are lacking (caret diagnostics? character encodings?). Well, clearly, the preprocessor and handling of #include is very C/C++ specific, and not used by Ada or Fortran. Both Ada and Fortran are linked with libcpp.a. As for diagnostics, Ada does use its own mechanism, mainly because it's much more powerful than the one in libcpp, and handles e.g. multiple locations related to cascaded generic package instantiations (C++ templates), as well as many other goodies already described by Robert is previous messages. Even if you do not use line-map.o, the middle-end does, so you still have the duplication. It has also handled column info from day one (and we would not want any of the e.g. -fshow-column stuff: why would you *not* want to display column information ? :-) Yes, why? So from GNAT's point of view, what's the point in duplicating all this work done and written in Ada in another less capable language (which would also make it more painful for GNAT developers to use), in particular if this capability is only available in very recent GCC versions. The point is that other front-ends would be able to use it. And non-Ada developers would be able to fix it and extend it. Moreover, even if GNAT won't use it, moving all this out of libcpp and making it modular would allow to link only to the things that you really need. Finally, the middle-end is using it anyway, so gnat is using it. Cheers, Manuel.
Re: Better GCC diagnostics
That is clear. Thanks. I personally would be perfectly happy if the compiler said bug.c:4.COLUMN: error: called object is not a function That is, fixing the compiler to includes parts of the source code in the error message itself is, for me, of considerably lower priority than fixing the compiler to generate good column numbers. I agree. Any decent IDE or source editor will point to the proper location once you have the correct column number. Arno
Re: Better GCC diagnostics
Oh yes. Well, there is a lot of fine-tunning to do but I think that would be covered by A.1 and the binary_op expression would have at least two locations begin/end pointing to X and r. If we are able to print ({break;}), in the example I gave earlier, then we will be able to print nice ranges most of the time. Interestingly, I am working right now on improving location information in the C (and possibly in the future C++) front-end, for the need of static analysis tools, and generation of cross-ref information (for use by IDEs), and have added locally support for additional locations for many expressions. The way I did it, since I thought that adding a new location_t field in the expr struct would likely be a no-no (due to additional memory usage) is to use a hash table on the side, and use macros (SET_EXPR_LOCATION2, GET_EXPR_LOCATION2) that are no-ops by default, and that populate the hash table when a given switch is enabled. I was planning to submit these changes at some later point, since this is still work in progress at this stage, but can work on submitting these earlier if this would help people in other contexts, and/or avoid duplicate work. Note that my patch not only adds support for additional locations (and it even adds as much locations as needed for each argument of a function call), but also fixes column number information in several cases. Arno
Re: Better GCC diagnostics
2008/8/20 Arnaud Charlet [EMAIL PROTECTED]: The way I did it, since I thought that adding a new location_t field in the expr struct would likely be a no-no (due to additional memory usage) is to use a hash table on the side, and use macros (SET_EXPR_LOCATION2, GET_EXPR_LOCATION2) that are no-ops by default, and that populate the hash table when a given switch is enabled. I was planning to submit these changes at some later point, since this is still work in progress at this stage, but can work on submitting these earlier if this would help people in other contexts, and/or avoid duplicate work. As far as I know nobody is working on adding more locations, so do as you please. If I ever get the time, I would like to abstract our line-map implementation within a location_manager object and API but I don't think this conflicts directly with your work. Would your implementation also handle two locations for tokens that come from macro expansion? Note that my patch not only adds support for additional locations (and it even adds as much locations as needed for each argument of a function call), but also fixes column number information in several cases. Those fixes should be submitted as soon as you find them, please. Cheers, Manuel.
Re: Better GCC diagnostics
Would your implementation also handle two locations for tokens that come from macro expansion? macro expansion are tricky to handle as far as I could see, so currently this is not handled. It's tricky because some locations are real in the source, and some are virtual from the macro. Consider for example: #define PLUS(a) ((a) + global_var) when in the source you have: source: PLUS (x) column: 1 5 7 you would like to have a location associated with 'x' ('a' in the macro), pointing to column 7, and not to e.g. column 1 as done currently. On the other hands, the implicit reference to 'global_var' is what I call virtual, and I guess the closest approximation in terms of location in the original source would be 'column 7' as well, assuming we want to represent it. Those fixes should be submitted as soon as you find them, please. Sure, some clean ups are still needed, but submitted them as soon as practical is part of the plans (my work is based on gcc trunk, so at least there's no issue of forward porting my changes). Thanks for your feedback, Arno
Re: Better GCC diagnostics
Manuel == Manuel López-Ibáñez [EMAIL PROTECTED] writes: Manuel If I ever get the time, I would like to abstract our line-map Manuel implementation within a location_manager object and API but Manuel I don't think this conflicts directly with your work. I am curious to know how this would be different from what we have now. I think the line-map API is rather ugly, personally, but it seems to me that a struct line_maps is essentially a location manager object. Tom
Re: Better GCC diagnostics
2008/8/20 Tom Tromey [EMAIL PROTECTED]: Manuel == Manuel López-Ibáñez [EMAIL PROTECTED] writes: Manuel If I ever get the time, I would like to abstract our line-map Manuel implementation within a location_manager object and API but Manuel I don't think this conflicts directly with your work. I am curious to know how this would be different from what we have now. I think the line-map API is rather ugly, personally, but it seems to me that a struct line_maps is essentially a location manager object. It is essentially. However, all the internal things are exposed all over the place for no good reason. This is in gcc/c-lex.c /* #define callback for DWARF and DWARF2 debug info. */ static void cb_define (cpp_reader *pfile, source_location loc, cpp_hashnode *node) { - const struct line_map *map = linemap_lookup (line_table, loc); - (*debug_hooks-define) (SOURCE_LINE (map, loc), + (*debug_hooks-define) (location_source_line (location_manager, loc), (const char *) cpp_macro_definition (pfile, node)); } This is in tree.h -expanded_location -expand_location (source_location loc) -{ - expanded_location xloc; - if (loc == 0) -{ - xloc.file = NULL; - xloc.line = 0; - xloc.column = 0; - xloc.sysp = 0; -} - else -{ - const struct line_map *map = linemap_lookup (line_table, loc); - xloc.file = map-to_file; - xloc.line = SOURCE_LINE (map, loc); - xloc.column = SOURCE_COLUMN (map, loc); - xloc.sysp = map-sysp != 0; -}; - return xloc; -} +#define expand_location(LOC) location_expand (location_manager, (LOC)) + Moreover, the location_manager probably should be separated from CCP. If we want to implement re-opening files and reading strings given locations, then opening/reading files should also be moved out of CCP to its own module/namespace/object. Cheers, Manuel.
Re: Better GCC diagnostics
If we want to implement re-opening files and reading strings given locations, then opening/reading files should also be moved out of CCP to its own module/namespace/object. Agreed. Other modules may find these APIs very handy. Currently many features are only available very deep or hidden inside libcpp implementation that could be useful in other contexts (such as e.g. getting source file dependency information). Arno
Re: Better GCC diagnostics
2008/8/20 Arnaud Charlet [EMAIL PROTECTED]: If we want to implement re-opening files and reading strings given locations, then opening/reading files should also be moved out of CCP to its own module/namespace/object. Agreed. Other modules may find these APIs very handy. Currently many features are only available very deep or hidden inside libcpp implementation that could be useful in other contexts (such as e.g. getting source file dependency information). Not just that, probably Fortran/Ada are already duplicating stuff that is in libcpp or they are implementing their own version of stuff that C/C++ are lacking (caret diagnostics? character encodings?). Cheers, Manuel.
Re: Better GCC diagnostics
Not just that, probably Fortran/Ada are already duplicating stuff that is in libcpp or they are implementing their own version of stuff that C/C++ are lacking (caret diagnostics? character encodings?). Well, clearly, the preprocessor and handling of #include is very C/C++ specific, and not used by Ada or Fortran. As for diagnostics, Ada does use its own mechanism, mainly because it's much more powerful than the one in libcpp, and handles e.g. multiple locations related to cascaded generic package instantiations (C++ templates), as well as many other goodies already described by Robert is previous messages. It has also handled column info from day one (and we would not want any of the e.g. -fshow-column stuff: why would you *not* want to display column information ? :-) So from GNAT's point of view, what's the point in duplicating all this work done and written in Ada in another less capable language (which would also make it more painful for GNAT developers to use), in particular if this capability is only available in very recent GCC versions. Arno
Re: Better GCC diagnostics
Manuel López-Ibáñez wrote: 2008/8/20 Arnaud Charlet [EMAIL PROTECTED]: If we want to implement re-opening files and reading strings given locations, then opening/reading files should also be moved out of CCP to its own module/namespace/object. Agreed. Other modules may find these APIs very handy. Currently many features are only available very deep or hidden inside libcpp implementation that could be useful in other contexts (such as e.g. getting source file dependency information). Not just that, probably Fortran/Ada are already duplicating stuff that is in libcpp or they are implementing their own version of stuff that C/C++ are lacking (caret diagnostics? character encodings?). The error mechanism in Ada is very extensive, probably would be quite a bit of work to find out how to interface effectively to this API, but something that would be nice to do in the long run. Cheers, Manuel.
Re: Better GCC diagnostics
Manuel López-Ibáñez [EMAIL PROTECTED] writes: Let's try to focus on what needs to be done looking for specific features (or fixes) and how we could do it: Thanks for writing this up. A) Printing the input expression instead of re-constructing it. As Joseph explained, this will fix the problems that Aldy mentioned (PR3544[123] and PR35742) and this requires: 1) For non-preprocessed expr we need at least two locations per expr (beg/end). This will require changes on the build_* functions to handle multiple locations. This is probably obvious, but can you outline why we need two locations for each expression? The tools with which I am familiar only print a single caret. What would use the two locations for? b) Re-open the file and fseek. This is not trivial since we need to do it fast but still do all character conversions that we did when libcpp opened it the first time. This is approximately what Clang (LLVM) does and it seems they can do it very fast by keeping a cache of buffers ever reopened. I think that thanks to our line-maps implementation, we can do the seeking quite more efficiently in terms of computation time. However, opening files is quite embedded into CPP, so that would need to be factored out so we can avoid any unnecessary CPP stuff when reopening but still do it *properly* and *efficiently*. If we are going to reopen the file, then why do we need to record the locations in the preprocessed token stream? If we keep, for each source line, the file offset in the file of the start of that source line, then I think that printing the line from the source file would be pretty fast. That would not be free but it would be much cheaper than keeping the entire input file. Various optimizations are possible--e.g., keep the file offset for every 16th line. Conversely, perhaps we could record the line number and the charset conversion state at each 4096th byte in the file; that would let us quickly find the page in the file which contains the line. Ian
Re: Better GCC diagnostics
2008/8/15 Ian Lance Taylor [EMAIL PROTECTED]: Manuel López-Ibáñez [EMAIL PROTECTED] writes: A) Printing the input expression instead of re-constructing it. As Joseph explained, this will fix the problems that Aldy mentioned (PR3544[123] and PR35742) and this requires: 1) For non-preprocessed expr we need at least two locations per expr (beg/end). This will require changes on the build_* functions to handle multiple locations. This is probably obvious, but can you outline why we need two locations for each expression? The tools with which I am familiar only print a single caret. What would use the two locations for? This has nothing to do with caret diagnostics. This is an orthogonal issue that would share some infrastructure as Joseph explained. If you do warning(called object %qE is not a function, expr); for ({break;})(); we currently try to re-construct expr and that fails in some cases (see the PRs referenced). #'goto_expr' not supported by pp_c_expression#'bug.c: In function 'foo': bug.c:4: error: called object is not a function The alternative is to print whatever we parsed when building expr. To do that we would need to have begin/end locations for expr, and then do a location_t-const char * translation and print whatever is between those two pointers: bug.c:4: error: called object '({break;})' is not a function Is it clear now? If so, I will update the wiki to put this example. b) Re-open the file and fseek. This is not trivial since we need to do it fast but still do all character conversions that we did when libcpp opened it the first time. This is approximately what Clang (LLVM) does and it seems they can do it very fast by keeping a cache of buffers ever reopened. I think that thanks to our line-maps implementation, we can do the seeking quite more efficiently in terms of computation time. However, opening files is quite embedded into CPP, so that would need to be factored out so we can avoid any unnecessary CPP stuff when reopening but still do it *properly* and *efficiently*. If we are going to reopen the file, then why do we need to record the locations in the preprocessed token stream? Because for some diagnostics we want to give the warnings in the instantiation point not in the macro definition point. Moreover, this is what we currently do, so if we don't want to change the current behaviour, we need to track both locations. Example /*header.h*/ #pragma GCC system_header #define BIG 0x1b27da572ef3cd86ULL /* file.c */ #include pr7263.h __extension__ unsigned long long bar () { return BIG; } We print a diagnostic at file.c for the expansion of BIG. However, since we do not have the original location we cannot check that the token comes from a system header, and we do not suppress the warning. There are more subtle bugs that arise from not having the original location available. See PR36478. BTW, Clang takes into account both locations when printing diagnostics. If we keep, for each source line, the file offset in the file of the start of that source line, then I think that printing the line from the source file would be pretty fast. That would not be free but it would be much cheaper than keeping the entire input file. Various Cheaper in terms of memory. It cannot be cheaper in terms of compilation time than a direct pointer to the already opened buffer for each line-map. optimizations are possible--e.g., keep the file offset for every 16th line. Conversely, perhaps we could record the line number and the charset conversion state at each 4096th byte in the file; that would let us quickly find the page in the file which contains the line. I am not sure how you plan such approach to interact with mapped-locations. I think that having an offset for each line-map and then seeking until you find the correct position would be fine for an initial implementation. More complex setups could be tested against this basic implementation. And any optimization done here could be done with the buffer already opened, so yes, cheaper in terms of memory maybe but not cheaper in terms of compilation time. If this is abstracted enough both approaches could perhaps coexist and share the optimizations: while the front-end is working (where most of the diagnostics come from) keep the buffers around, when going into middle-end free them and if we need to give a diagnostic from the middle-end, reopen and seek. But all this relies on someone factoring file opening and charset conversion out of CCP first. Once that is done, we could do whatever strategy or both or something else. Cheers, Manuel.
Re: Better GCC diagnostics
D) Printing Ranges. This requires: *) Printing accurate column information. Similar to (B) above. *) A location(s) - source strings interface and machinery. Similar to (A.3) above. Ranges also require some way to get the end of a token (in addition to its beginning). For example, a range for: X + some_long\ _ident??/ ifier The range should start at X and end at r. This is just a location like any other, but requires passing down like the begin loc. You might instead decide to do some fuzzy matching or something, but clang at least gets this right. This is important for other clients of the loc info, e.g. refactoring clients. -Chris
Re: Better GCC diagnostics
2008/8/15 Chris Lattner [EMAIL PROTECTED]: D) Printing Ranges. This requires: *) Printing accurate column information. Similar to (B) above. *) A location(s) - source strings interface and machinery. Similar to (A.3) above. Ranges also require some way to get the end of a token (in addition to its beginning). For example, a range for: X + some_long\ _ident??/ ifier The range should start at X and end at r. This is just a location like any other, but requires passing down like the begin loc. You might instead decide to do some fuzzy matching or something, but clang at least gets this right. This is important for other clients of the loc info, e.g. refactoring clients. Oh yes. Well, there is a lot of fine-tunning to do but I think that would be covered by A.1 and the binary_op expression would have at least two locations begin/end pointing to X and r. If we are able to print ({break;}), in the example I gave earlier, then we will be able to print nice ranges most of the time. Cheers, Manuel.
Re: Better GCC diagnostics
On Fri, 15 Aug 2008, Manuel López-Ibáñez wrote: 5) Handle locations during folding or avoid aggressive folding in the front-ends. I plan to delay folding for C (beyond some folding for expressions of constants) for 4.5, probably in October. (I do not plan to do anything for C++, and the folding will still happen before gimplification, though I believe that in principle most of what fold-const does would be better done on language-independent IR after gimplification.) Note that for correct ranges you need to handle locations for 0 and (0) and ((0)) and so on, so either need duplicate trees for constants and parenthesised expressions or need to keep the locations outside the trees. It's possible you can keep them in the c_expr structures or other local variables, depending on whether you ever need locations for both an expression and deeply nested subexpressions of it. -- Joseph S. Myers [EMAIL PROTECTED]
Re: Better GCC diagnostics
Manuel López-Ibáñez [EMAIL PROTECTED] writes: 2008/8/15 Ian Lance Taylor [EMAIL PROTECTED]: Manuel López-Ibáñez [EMAIL PROTECTED] writes: A) Printing the input expression instead of re-constructing it. As Joseph explained, this will fix the problems that Aldy mentioned (PR3544[123] and PR35742) and this requires: 1) For non-preprocessed expr we need at least two locations per expr (beg/end). This will require changes on the build_* functions to handle multiple locations. This is probably obvious, but can you outline why we need two locations for each expression? The tools with which I am familiar only print a single caret. What would use the two locations for? This has nothing to do with caret diagnostics. This is an orthogonal issue that would share some infrastructure as Joseph explained. If you do warning(called object %qE is not a function, expr); for ({break;})(); we currently try to re-construct expr and that fails in some cases (see the PRs referenced). #'goto_expr' not supported by pp_c_expression#'bug.c: In function 'foo': bug.c:4: error: called object is not a function The alternative is to print whatever we parsed when building expr. To do that we would need to have begin/end locations for expr, and then do a location_t-const char * translation and print whatever is between those two pointers: bug.c:4: error: called object '({break;})' is not a function Is it clear now? If so, I will update the wiki to put this example. That is clear. Thanks. I personally would be perfectly happy if the compiler said bug.c:4.COLUMN: error: called object is not a function That is, fixing the compiler to includes parts of the source code in the error message itself is, for me, of considerably lower priority than fixing the compiler to generate good column numbers. b) Re-open the file and fseek. This is not trivial since we need to do it fast but still do all character conversions that we did when libcpp opened it the first time. This is approximately what Clang (LLVM) does and it seems they can do it very fast by keeping a cache of buffers ever reopened. I think that thanks to our line-maps implementation, we can do the seeking quite more efficiently in terms of computation time. However, opening files is quite embedded into CPP, so that would need to be factored out so we can avoid any unnecessary CPP stuff when reopening but still do it *properly* and *efficiently*. If we are going to reopen the file, then why do we need to record the locations in the preprocessed token stream? Because for some diagnostics we want to give the warnings in the instantiation point not in the macro definition point. Moreover, this is what we currently do, so if we don't want to change the current behaviour, we need to track both locations. Example /*header.h*/ #pragma GCC system_header #define BIG 0x1b27da572ef3cd86ULL /* file.c */ #include pr7263.h __extension__ unsigned long long bar () { return BIG; } We print a diagnostic at file.c for the expansion of BIG. However, since we do not have the original location we cannot check that the token comes from a system header, and we do not suppress the warning. There are more subtle bugs that arise from not having the original location available. See PR36478. BTW, Clang takes into account both locations when printing diagnostics. Perhaps I misunderstand what you mean by recording the location in the preprocessed token stream. You evidently do not mean getting column numbers for the preprocessed code. You mean that when a preprocessor macro is expanded, we should record both the location where the macro is used, and also some sort of reference to the macro so that we know the location where the macro was defined. Is that right? If we keep, for each source line, the file offset in the file of the start of that source line, then I think that printing the line from the source file would be pretty fast. That would not be free but it would be much cheaper than keeping the entire input file. Various Cheaper in terms of memory. It cannot be cheaper in terms of compilation time than a direct pointer to the already opened buffer for each line-map. Except that we know that increased memory use leads to increased compile time. Diagnostic printing can't be slow, but it's also not on the critical path. Most code does not generate diagnostics. So there is a balance to be struck. Ian
Re: Better GCC diagnostics
2008/8/15 Ian Lance Taylor [EMAIL PROTECTED]: BTW, Clang takes into account both locations when printing diagnostics. Perhaps I misunderstand what you mean by recording the location in the preprocessed token stream. You evidently do not mean getting column numbers for the preprocessed code. You mean that when a preprocessor macro is expanded, we should record both the location where the macro is used, and also some sort of reference to the macro so that we know the location where the macro was defined. Is that right? I don't see the difference. We do currently assign to the preprocessed code the location of the macro call. So we get column numbers to the preprocessed code. In addition, we should record the location were the macro was defined. In the previous example, we should be able to obtain somehow two locations for 0x1b27da572ef3cd86ULL, one for its position in header.h and another for its final position in file.c. Whether we record two locations explicitly or we keep some lookup table of macro expansions/definitions would depend on what is found to be the most efficient implementation. I don't know which approach Clang follows but in our case it would be better to use the mapped-location infrastructure to track this for us in a single source_location number. Cheers, Manuel.
Re: Better GCC diagnostics
On Fri, 15 Aug 2008, Ian Lance Taylor wrote: Perhaps I misunderstand what you mean by recording the location in the preprocessed token stream. You evidently do not mean getting column numbers for the preprocessed code. You mean that when a preprocessor macro is expanded, we should record both the location where the macro is used, and also some sort of reference to the macro so that we know the location where the macro was defined. Is that right? My use case for locations in preprocessed code is tracking down diagnostics arising in the expansions of complicated macros. I find that's the main case where GCC's existing diagnostics are unhelpful; in that case I would like an option to show the relevant fragment of preprocessed source that causes the diagnostic. There are also easy heuristics to decide whether the preprocessed source is likely to be more useful than the non-preprocessed source. Indexes into a table of tokens would be one possibility; C++ already creates such a table in the course of lexing the whole input up front. -- Joseph S. Myers [EMAIL PROTECTED]