Re: Better GCC diagnostics

2008-08-26 Thread Arnaud Charlet
 My point is that if the rest of the compiler extends diagnostics.c to
 handle caret and ranges and all other features, Ada would need carry
 all that around even if gnat has its own diagnostics machinery. But
 perhaps that is OK.

Certainly OK, what possible (noticeable) issue would you foresee ?

 I am not trying to force anything just make you
 aware of the situation.

Well, it's not like there's a choice anyway. As Robert and myself explained,
GNAT's machinery is much more powerful in terms of diagnostic compared to
libcpp (or even compared to what's planned), and this
machinery is used in other non GCC contexts anyway, so has its own
life cycle and raison d'être.

 I see that as very unrealistic, GCC development is already complex
 enough, so we will have to live with the duplication.

That was of course a semi joke :-) Only semi because it would make
a lot of sense to write a code generator from scratch in Ada, but of
course it does not make much sense to rewrite GCC in Ada.

 I just wanted to
 make you notice that even if location_t is moved out of libcpp, GNAT
 will still link with diagnostic.c and whatever other diagnostics
 features that will be implemented. The burden of the duplication would
 be on GNAT executable, the other front-ends do not carry GNAT's code.
 Probably the effect of it is negligible, anyway.

Right, effect is clearly negligible, if noticeable at all.

 Again, my point is that even if GNAT has all these features already,
 GNAT should be interested in this since it means moving stuff out of
 libcpp which will allow to break that dependency. C/C++ could very
 well implement caret diagnostics and everything else within libcpp, so
 the dependency of GNAT with libcpp would be actually harder to break.
 My fear is that if the other FEs do not care about this issue because
 we have it already, then this is exactly what will happen.

Well, if other GCC developers do not see the obvious advantages in
splitting libcpp in several logical pieces, then so be it, what can I
say.

Arno


Re: Better GCC diagnostics

2008-08-25 Thread Arnaud Charlet
  Well, clearly, the preprocessor and handling of #include is very
  C/C++ specific, and not used by Ada or Fortran.
 
 Both Ada and Fortran are linked with libcpp.a.

Sure, because as we've said, libcpp is not modular enough. Clearly Ada does not
need to link with A C preprocessor at all.

 Even if you do not use line-map.o, the middle-end does, so you still
 have the duplication.

Right, this is the only part that is indeed shared and for which Ada requires
libcpp.

  It has also handled column info from day one (and we would not want any
  of the e.g. -fshow-column stuff: why would you *not* want to display
  column information ? :-)
 
 Yes, why?

So, I assume there are plans to get rid of -fshow-column switch (and make it the
only default).

 The point is that other front-ends would be able to use it. And
 non-Ada developers would be able to fix it and extend it. Moreover,

I certainly have no problem giving access to other front-ends to GNAT's
capability here, as long as everyone is cool with using Ada in other front-ends,
I'm certainly all for it.

 even if GNAT won't use it, moving all this out of libcpp and making it
 modular would allow to link only to the things that you really need.

Definitely.

 Finally, the middle-end is using it anyway, so gnat is using it.

Sure. Using line-map is one thing, using a C preprocessor is another.

Arno


Re: Better GCC diagnostics

2008-08-25 Thread Paolo Bonzini

 Even if you do not use line-map.o, the middle-end does, so you still
 have the duplication.
 
 Right, this is the only part that is indeed shared and for which Ada requires
 libcpp.

There are exactly 6 uses of linemap functions in libcpp.  They could be
replaced with a handful of function pointers, moving line-map.c to gcc.

There are 2 more uses in makedepend.c, which could define its own dummy
functions because it does not care about locations at all.

Paolo


Re: Better GCC diagnostics

2008-08-25 Thread Manuel López-Ibáñez
2008/8/25 Arnaud Charlet [EMAIL PROTECTED]:
 Even if you do not use line-map.o, the middle-end does, so you still
 have the duplication.

 Right, this is the only part that is indeed shared and for which Ada requires
 libcpp.

My point is that if the rest of the compiler extends diagnostics.c to
handle caret and ranges and all other features, Ada would need carry
all that around even if gnat has its own diagnostics machinery. But
perhaps that is OK. I am not trying to force anything just make you
aware of the situation.

  It has also handled column info from day one (and we would not want any
  of the e.g. -fshow-column stuff: why would you *not* want to display
  column information ? :-)

 Yes, why?

 So, I assume there are plans to get rid of -fshow-column switch (and make it 
 the
 only default).

No, I cannot decide such thing. I would be happy to write such patch, though.

 The point is that other front-ends would be able to use it. And
 non-Ada developers would be able to fix it and extend it. Moreover,

 I certainly have no problem giving access to other front-ends to GNAT's
 capability here, as long as everyone is cool with using Ada in other 
 front-ends,
 I'm certainly all for it.

I see that as very unrealistic, GCC development is already complex
enough, so we will have to live with the duplication. I just wanted to
make you notice that even if location_t is moved out of libcpp, GNAT
will still link with diagnostic.c and whatever other diagnostics
features that will be implemented. The burden of the duplication would
be on GNAT executable, the other front-ends do not carry GNAT's code.
Probably the effect of it is negligible, anyway.

 Finally, the middle-end is using it anyway, so gnat is using it.

 Sure. Using line-map is one thing, using a C preprocessor is another.

Again, my point is that even if GNAT has all these features already,
GNAT should be interested in this since it means moving stuff out of
libcpp which will allow to break that dependency. C/C++ could very
well implement caret diagnostics and everything else within libcpp, so
the dependency of GNAT with libcpp would be actually harder to break.
My fear is that if the other FEs do not care about this issue because
we have it already, then this is exactly what will happen.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-25 Thread Manuel López-Ibáñez
2008/8/25 Paolo Bonzini [EMAIL PROTECTED]:

 Even if you do not use line-map.o, the middle-end does, so you still
 have the duplication.

 Right, this is the only part that is indeed shared and for which Ada requires
 libcpp.

 There are exactly 6 uses of linemap functions in libcpp.  They could be
 replaced with a handful of function pointers, moving line-map.c to gcc.

You have to count all uses of:

SOURCE_LINE, SOURCE_COLUMN, MAIN_FILE_P, LAST_SOURCE_LINE,
LAST_SOURCE_LINE_LOCATION, and LINEMAP_POSITION_FOR_COLUMN

which are all macros. Also, line-map functions seem to be in the
critical path, so the extra dereferencing may be noticeable.

 There are 2 more uses in makedepend.c, which could define its own dummy
 functions because it does not care about locations at all.

Sure about that? line-map.h seems pretty important to handle the include stack.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-25 Thread Paolo Bonzini
Manuel López-Ibáñez wrote:
 2008/8/25 Paolo Bonzini [EMAIL PROTECTED]:
 Even if you do not use line-map.o, the middle-end does, so you still
 have the duplication.
 Right, this is the only part that is indeed shared and for which Ada 
 requires
 libcpp.
 There are exactly 6 uses of linemap functions in libcpp.  They could be
 replaced with a handful of function pointers, moving line-map.c to gcc.
 
 You have to count all uses of:
 
 SOURCE_LINE, SOURCE_COLUMN, MAIN_FILE_P, LAST_SOURCE_LINE,
 LAST_SOURCE_LINE_LOCATION, and LINEMAP_POSITION_FOR_COLUMN
 
 which are all macros.

Yes, but libcpp could still provide the *interface* of line-map.h (the
macros), and leave the actual implementation of the data structure to
the client.

 Also, line-map functions seem to be in the
 critical path, so the extra dereferencing may be noticeable.

That's true.  Unless we lexed everything up front, in which case one
could hope that the indirect branches would be predicted well.  In fact,
I'm curious about how good/bad the branch predictor works for GCC.  My
wild guess: awfully.

 Sure about that? line-map.h seems pretty important to handle the include 
 stack.

As far as I could see, all the dependency stuff is in mkdeps.c; maybe it
could use the line-map infrastructure more.  Of course, in that case
moving line-map.c out of libcpp would mean moving mkdeps.c too -- and
not in gcc/, but rather in its own library so that makedepend can use
it... at which point you wonder if it was really a good idea...

Paolo


Re: Better GCC diagnostics

2008-08-25 Thread Manuel López-Ibáñez
2008/8/25 Paolo Bonzini [EMAIL PROTECTED]:

 As far as I could see, all the dependency stuff is in mkdeps.c; maybe it
 could use the line-map infrastructure more.  Of course, in that case
 moving line-map.c out of libcpp would mean moving mkdeps.c too -- and
 not in gcc/, but rather in its own library so that makedepend can use
 it... at which point you wonder if it was really a good idea...

Well, my opinion is that line-map.c should be in its own
library/object file independent of libcpp but libcpp (and makedepend
and cc1 and gnat) should still be able to use it, preferably directly
and not through pointers. Not sure how to do this but I think it is
what we do with libiberty, don't we? Perhaps there is an even easier
way.

The same should happen with diagnostic.[ch].

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-25 Thread Tobias Burnus

Manuel López-Ibáñez wrote:

Again, my point is that even if GNAT has all these features already,
GNAT should be interested in this since it means moving stuff out of
libcpp which will allow to break that dependency. C/C++ could very
well implement caret diagnostics and everything else within libcpp, so
the dependency of GNAT with libcpp would be actually harder to break.
My fear is that if the other FEs do not care about this issue because
we have it already, then this is exactly what will happen.


Fortran is probably an example of don't care: Fortran already uses caret 
diagnostics, but it depends also on libcpp as several Fortran programs 
require CPP preprocessing.


Tobias



Re: Better GCC diagnostics

2008-08-24 Thread Manuel López-Ibáñez
2008/8/20 Arnaud Charlet [EMAIL PROTECTED]:
 Not just that, probably Fortran/Ada are already duplicating stuff that
 is in libcpp or they are implementing their own version of stuff that
 C/C++ are lacking (caret diagnostics? character encodings?).

 Well, clearly, the preprocessor and handling of #include is very
 C/C++ specific, and not used by Ada or Fortran.

Both Ada and Fortran are linked with libcpp.a.

 As for diagnostics, Ada does use its own mechanism, mainly because it's
 much more powerful than the one in libcpp, and handles e.g. multiple
 locations related to cascaded generic package instantiations (C++ templates),
 as well as many other goodies already described by Robert is previous 
 messages.

Even if you do not use line-map.o, the middle-end does, so you still
have the duplication.

 It has also handled column info from day one (and we would not want any
 of the e.g. -fshow-column stuff: why would you *not* want to display
 column information ? :-)

Yes, why?

 So from GNAT's point of view, what's the point in duplicating all this
 work done and written in Ada in another less capable language (which would 
 also
 make it more painful for GNAT developers to use), in particular if this
 capability is only available in very recent GCC versions.

The point is that other front-ends would be able to use it. And
non-Ada developers would be able to fix it and extend it. Moreover,
even if GNAT won't use it, moving all this out of libcpp and making it
modular would allow to link only to the things that you really need.
Finally, the middle-end is using it anyway, so gnat is using it.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-20 Thread Arnaud Charlet
 That is clear.  Thanks.  I personally would be perfectly happy if the
 compiler said
 bug.c:4.COLUMN: error: called object is not a function
 That is, fixing the compiler to includes parts of the source code in
 the error message itself is, for me, of considerably lower priority
 than fixing the compiler to generate good column numbers.

I agree. Any decent IDE or source editor will point to the proper location
once you have the correct column number.

Arno


Re: Better GCC diagnostics

2008-08-20 Thread Arnaud Charlet
 Oh yes. Well, there is a lot of fine-tunning to do but I think that
 would be covered by A.1 and the binary_op expression would have at
 least two locations begin/end pointing to X and r. If we are able to
 print ({break;}), in the example I gave earlier, then we will be able
 to print nice ranges most of the time.

Interestingly, I am working right now on improving location information
in the C (and possibly in the future C++) front-end, for the need of
static analysis tools, and generation of cross-ref information (for use
by IDEs), and have added locally support for additional locations for
many expressions.

The way I did it, since I thought that adding a new location_t field in
the expr struct would likely be a no-no (due to additional memory usage) is to
use a hash table on the side, and use macros (SET_EXPR_LOCATION2,
GET_EXPR_LOCATION2) that are no-ops by default, and that populate the
hash table when a given switch is enabled.

I was planning to submit these changes at some later point, since this
is still work in progress at this stage, but can work on submitting these
earlier if this would help people in other contexts, and/or avoid duplicate
work.

Note that my patch not only adds support for additional locations (and it
even adds as much locations as needed for each argument of a function call),
but also fixes column number information in several cases.

Arno


Re: Better GCC diagnostics

2008-08-20 Thread Manuel López-Ibáñez
2008/8/20 Arnaud Charlet [EMAIL PROTECTED]:

 The way I did it, since I thought that adding a new location_t field in
 the expr struct would likely be a no-no (due to additional memory usage) is to
 use a hash table on the side, and use macros (SET_EXPR_LOCATION2,
 GET_EXPR_LOCATION2) that are no-ops by default, and that populate the
 hash table when a given switch is enabled.

 I was planning to submit these changes at some later point, since this
 is still work in progress at this stage, but can work on submitting these
 earlier if this would help people in other contexts, and/or avoid duplicate
 work.

As far as I know nobody is working on adding more locations, so do as
you please. If I ever get the time, I would like to abstract our
line-map implementation within a location_manager object and API but
I don't think this conflicts directly with your work.

Would your implementation also handle two locations for tokens that
come from macro expansion?

 Note that my patch not only adds support for additional locations (and it
 even adds as much locations as needed for each argument of a function call),
 but also fixes column number information in several cases.

Those fixes should be submitted as soon as you find them, please.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-20 Thread Arnaud Charlet
 Would your implementation also handle two locations for tokens that
 come from macro expansion?

macro expansion are tricky to handle as far as I could see, so currently
this is not handled. It's tricky because some locations are real in
the source, and some are virtual from the macro.

Consider for example:

#define PLUS(a) ((a) + global_var)

when in the source you have:

source:  PLUS (x)
column:  1   5 7

you would like to have a location associated with 'x' ('a' in the macro),
pointing to column 7, and not to e.g. column 1 as done currently.

On the other hands, the implicit reference to 'global_var' is what I
call virtual, and I guess the closest approximation in terms of
location in the original source would be 'column 7' as well, assuming
we want to represent it.

 Those fixes should be submitted as soon as you find them, please.

Sure, some clean ups are still needed, but submitted them as soon as
practical is part of the plans (my work is based on gcc trunk, so at least
there's no issue of forward porting my changes).

Thanks for your feedback,

Arno


Re: Better GCC diagnostics

2008-08-20 Thread Tom Tromey
 Manuel == Manuel López-Ibáñez [EMAIL PROTECTED] writes:

Manuel If I ever get the time, I would like to abstract our line-map
Manuel implementation within a location_manager object and API but
Manuel I don't think this conflicts directly with your work.

I am curious to know how this would be different from what we have
now.

I think the line-map API is rather ugly, personally, but it seems to
me that a struct line_maps is essentially a location manager object.

Tom


Re: Better GCC diagnostics

2008-08-20 Thread Manuel López-Ibáñez
2008/8/20 Tom Tromey [EMAIL PROTECTED]:
 Manuel == Manuel López-Ibáñez [EMAIL PROTECTED] writes:

 Manuel If I ever get the time, I would like to abstract our line-map
 Manuel implementation within a location_manager object and API but
 Manuel I don't think this conflicts directly with your work.

 I am curious to know how this would be different from what we have
 now.

 I think the line-map API is rather ugly, personally, but it seems to
 me that a struct line_maps is essentially a location manager object.

It is essentially. However, all the internal things are exposed all
over the place for no good reason.

This is in gcc/c-lex.c

 /* #define callback for DWARF and DWARF2 debug info.  */
 static void
 cb_define (cpp_reader *pfile, source_location loc, cpp_hashnode *node)
 {
-  const struct line_map *map = linemap_lookup (line_table, loc);
-  (*debug_hooks-define) (SOURCE_LINE (map, loc),
+  (*debug_hooks-define) (location_source_line (location_manager, loc),
  (const char *) cpp_macro_definition (pfile, node));
 }

This is in tree.h

-expanded_location
-expand_location (source_location loc)
-{
-  expanded_location xloc;
-  if (loc == 0)
-{
-  xloc.file = NULL;
-  xloc.line = 0;
-  xloc.column = 0;
-  xloc.sysp = 0;
-}
-  else
-{
-  const struct line_map *map = linemap_lookup (line_table, loc);
-  xloc.file = map-to_file;
-  xloc.line = SOURCE_LINE (map, loc);
-  xloc.column = SOURCE_COLUMN (map, loc);
-  xloc.sysp = map-sysp != 0;
-};
-  return xloc;
-}
+#define expand_location(LOC) location_expand (location_manager, (LOC))
+

Moreover, the location_manager probably should be separated from CCP.

If we want to implement re-opening files and reading strings given
locations, then opening/reading files should also be moved out of CCP
to its own module/namespace/object.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-20 Thread Arnaud Charlet
 If we want to implement re-opening files and reading strings given
 locations, then opening/reading files should also be moved out of CCP
 to its own module/namespace/object.

Agreed. Other modules may find these APIs very handy.
Currently many features are only available very deep or hidden inside
libcpp implementation that could be useful in other contexts (such
as e.g. getting source file dependency information).

Arno


Re: Better GCC diagnostics

2008-08-20 Thread Manuel López-Ibáñez
2008/8/20 Arnaud Charlet [EMAIL PROTECTED]:
 If we want to implement re-opening files and reading strings given
 locations, then opening/reading files should also be moved out of CCP
 to its own module/namespace/object.

 Agreed. Other modules may find these APIs very handy.
 Currently many features are only available very deep or hidden inside
 libcpp implementation that could be useful in other contexts (such
 as e.g. getting source file dependency information).

Not just that, probably Fortran/Ada are already duplicating stuff that
is in libcpp or they are implementing their own version of stuff that
C/C++ are lacking (caret diagnostics? character encodings?).

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-20 Thread Arnaud Charlet
 Not just that, probably Fortran/Ada are already duplicating stuff that
 is in libcpp or they are implementing their own version of stuff that
 C/C++ are lacking (caret diagnostics? character encodings?).

Well, clearly, the preprocessor and handling of #include is very
C/C++ specific, and not used by Ada or Fortran.

As for diagnostics, Ada does use its own mechanism, mainly because it's
much more powerful than the one in libcpp, and handles e.g. multiple
locations related to cascaded generic package instantiations (C++ templates),
as well as many other goodies already described by Robert is previous messages.
It has also handled column info from day one (and we would not want any
of the e.g. -fshow-column stuff: why would you *not* want to display
column information ? :-)

So from GNAT's point of view, what's the point in duplicating all this
work done and written in Ada in another less capable language (which would also
make it more painful for GNAT developers to use), in particular if this
capability is only available in very recent GCC versions.

Arno


Re: Better GCC diagnostics

2008-08-20 Thread Robert Dewar

Manuel López-Ibáñez wrote:

2008/8/20 Arnaud Charlet [EMAIL PROTECTED]:

If we want to implement re-opening files and reading strings given
locations, then opening/reading files should also be moved out of CCP
to its own module/namespace/object.

Agreed. Other modules may find these APIs very handy.
Currently many features are only available very deep or hidden inside
libcpp implementation that could be useful in other contexts (such
as e.g. getting source file dependency information).


Not just that, probably Fortran/Ada are already duplicating stuff that
is in libcpp or they are implementing their own version of stuff that
C/C++ are lacking (caret diagnostics? character encodings?).


The error mechanism in Ada is very extensive, probably would be quite
a bit of work to find out how to interface effectively to this API,
but something that would be nice to do in the long run.


Cheers,

Manuel.




Re: Better GCC diagnostics

2008-08-15 Thread Ian Lance Taylor
Manuel López-Ibáñez [EMAIL PROTECTED] writes:

 Let's try to focus on what needs to be done looking for specific
 features (or fixes) and how we could do it:

Thanks for writing this up.

 A) Printing the input expression instead of re-constructing it. As
Joseph explained, this will fix the problems that Aldy mentioned
(PR3544[123] and PR35742) and this requires:

   1) For non-preprocessed expr we need at least two locations per expr
  (beg/end). This will require changes on the build_* functions to
  handle multiple locations.

This is probably obvious, but can you outline why we need two
locations for each expression?  The tools with which I am familiar
only print a single caret.  What would use the two locations for?


  b) Re-open the file and fseek. This is not trivial since we need
 to do it fast but still do all character conversions that we
 did when libcpp opened it the first time. This is
 approximately what Clang (LLVM) does and it seems they can do
 it very fast by keeping a cache of buffers ever reopened. I
 think that thanks to our line-maps implementation, we can do
 the seeking quite more efficiently in terms of computation
 time.  However, opening files is quite embedded into CPP, so
 that would need to be factored out so we can avoid any
 unnecessary CPP stuff when reopening but still do it
 *properly* and *efficiently*.

If we are going to reopen the file, then why do we need to record the
locations in the preprocessed token stream?

If we keep, for each source line, the file offset in the file of the
start of that source line, then I think that printing the line from
the source file would be pretty fast.  That would not be free but it
would be much cheaper than keeping the entire input file.  Various
optimizations are possible--e.g., keep the file offset for every 16th
line. Conversely, perhaps we could record the line number and the
charset conversion state at each 4096th byte in the file; that would
let us quickly find the page in the file which contains the line.

Ian


Re: Better GCC diagnostics

2008-08-15 Thread Manuel López-Ibáñez
2008/8/15 Ian Lance Taylor [EMAIL PROTECTED]:
 Manuel López-Ibáñez [EMAIL PROTECTED] writes:

 A) Printing the input expression instead of re-constructing it. As
Joseph explained, this will fix the problems that Aldy mentioned
(PR3544[123] and PR35742) and this requires:

   1) For non-preprocessed expr we need at least two locations per expr
  (beg/end). This will require changes on the build_* functions to
  handle multiple locations.

 This is probably obvious, but can you outline why we need two
 locations for each expression?  The tools with which I am familiar
 only print a single caret.  What would use the two locations for?

This has nothing to do with caret diagnostics. This is an orthogonal
issue that would share some infrastructure as Joseph explained. If you
do

warning(called object %qE is not a function, expr);

for

({break;})();

we currently try to re-construct expr and that fails in some cases
(see the PRs referenced).

#'goto_expr' not supported by pp_c_expression#'bug.c: In function 'foo':
bug.c:4: error: called object  is not a function

The alternative is to print whatever we parsed when building expr. To
do that we would need to have begin/end locations for expr, and then
do a location_t-const char * translation and print whatever is
between those two pointers:

bug.c:4: error: called object '({break;})' is not a function


Is it clear now? If so, I will update the wiki to put this example.

  b) Re-open the file and fseek. This is not trivial since we need
 to do it fast but still do all character conversions that we
 did when libcpp opened it the first time. This is
 approximately what Clang (LLVM) does and it seems they can do
 it very fast by keeping a cache of buffers ever reopened. I
 think that thanks to our line-maps implementation, we can do
 the seeking quite more efficiently in terms of computation
 time.  However, opening files is quite embedded into CPP, so
 that would need to be factored out so we can avoid any
 unnecessary CPP stuff when reopening but still do it
 *properly* and *efficiently*.

 If we are going to reopen the file, then why do we need to record the
 locations in the preprocessed token stream?

Because for some diagnostics we want to give the warnings in the
instantiation point not in the macro definition point. Moreover, this
is what we currently do, so if we don't want to change the current
behaviour, we need to track both locations.

Example

/*header.h*/
#pragma GCC system_header
#define BIG  0x1b27da572ef3cd86ULL

/* file.c */
#include pr7263.h
__extension__ unsigned long long
bar ()
{
  return BIG;
}

We print a diagnostic at file.c for the expansion of BIG. However,
since we do not have the original location we cannot check that the
token comes from a system header, and we do not suppress the warning.
There are more subtle bugs that arise from not having the original
location available. See PR36478.

BTW, Clang takes into account both locations when printing diagnostics.

 If we keep, for each source line, the file offset in the file of the
 start of that source line, then I think that printing the line from
 the source file would be pretty fast.  That would not be free but it
 would be much cheaper than keeping the entire input file.  Various

Cheaper in terms of memory. It cannot be cheaper in terms of
compilation time than a direct pointer to the already opened buffer
for each line-map.

 optimizations are possible--e.g., keep the file offset for every 16th
 line. Conversely, perhaps we could record the line number and the
 charset conversion state at each 4096th byte in the file; that would
 let us quickly find the page in the file which contains the line.

I am not sure how you plan such approach to interact with
mapped-locations. I think that having an offset for each line-map and
then seeking until you find the correct position would be fine for an
initial implementation. More complex setups could be tested against
this basic implementation. And any optimization done here could be
done with the buffer already opened, so yes, cheaper in terms of
memory maybe but not cheaper in terms of compilation time. If this is
abstracted enough both approaches could perhaps coexist and share the
optimizations: while the front-end is working (where most of the
diagnostics come from) keep the buffers around, when going into
middle-end free them and if we need to give a diagnostic from the
middle-end, reopen and seek. But all this relies on someone factoring
file opening and charset conversion out of CCP first. Once that is
done, we could do whatever strategy or both or something else.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-15 Thread Chris Lattner

D) Printing Ranges. This requires:

  *) Printing accurate column information. Similar to (B) above.

  *) A location(s) - source strings interface and machinery. Similar
 to (A.3) above.


Ranges also require some way to get the end of a token (in addition to  
its beginning).  For example, a range for:


X + some_long\
_ident??/
ifier

The range should start at X and end at r.  This is just a location  
like any other, but requires passing down like the begin loc.  You  
might instead decide to do some fuzzy matching or something, but clang  
at least gets this right.  This is important for other clients of the  
loc info, e.g. refactoring clients.


-Chris


Re: Better GCC diagnostics

2008-08-15 Thread Manuel López-Ibáñez
2008/8/15 Chris Lattner [EMAIL PROTECTED]:
 D) Printing Ranges. This requires:

  *) Printing accurate column information. Similar to (B) above.

  *) A location(s) - source strings interface and machinery. Similar
 to (A.3) above.

 Ranges also require some way to get the end of a token (in addition to its
 beginning).  For example, a range for:

 X + some_long\
 _ident??/
 ifier

 The range should start at X and end at r.  This is just a location like
 any other, but requires passing down like the begin loc.  You might instead
 decide to do some fuzzy matching or something, but clang at least gets this
 right.  This is important for other clients of the loc info, e.g.
 refactoring clients.

Oh yes. Well, there is a lot of fine-tunning to do but I think that
would be covered by A.1 and the binary_op expression would have at
least two locations begin/end pointing to X and r. If we are able to
print ({break;}), in the example I gave earlier, then we will be able
to print nice ranges most of the time.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-15 Thread Joseph S. Myers
On Fri, 15 Aug 2008, Manuel López-Ibáñez wrote:

   5) Handle locations during folding or avoid aggressive folding in
  the front-ends.

I plan to delay folding for C (beyond some folding for expressions of 
constants) for 4.5, probably in October.  (I do not plan to do anything 
for C++, and the folding will still happen before gimplification, though I 
believe that in principle most of what fold-const does would be better 
done on language-independent IR after gimplification.)

Note that for correct ranges you need to handle locations for 0 and (0) 
and ((0)) and so on, so either need duplicate trees for constants and 
parenthesised expressions or need to keep the locations outside the trees.  
It's possible you can keep them in the c_expr structures or other local 
variables, depending on whether you ever need locations for both an 
expression and deeply nested subexpressions of it.

-- 
Joseph S. Myers
[EMAIL PROTECTED]

Re: Better GCC diagnostics

2008-08-15 Thread Ian Lance Taylor
Manuel López-Ibáñez [EMAIL PROTECTED] writes:

 2008/8/15 Ian Lance Taylor [EMAIL PROTECTED]:
 Manuel López-Ibáñez [EMAIL PROTECTED] writes:

 A) Printing the input expression instead of re-constructing it. As
Joseph explained, this will fix the problems that Aldy mentioned
(PR3544[123] and PR35742) and this requires:

   1) For non-preprocessed expr we need at least two locations per expr
  (beg/end). This will require changes on the build_* functions to
  handle multiple locations.

 This is probably obvious, but can you outline why we need two
 locations for each expression?  The tools with which I am familiar
 only print a single caret.  What would use the two locations for?

 This has nothing to do with caret diagnostics. This is an orthogonal
 issue that would share some infrastructure as Joseph explained. If you
 do

 warning(called object %qE is not a function, expr);

 for

 ({break;})();

 we currently try to re-construct expr and that fails in some cases
 (see the PRs referenced).

 #'goto_expr' not supported by pp_c_expression#'bug.c: In function 'foo':
 bug.c:4: error: called object  is not a function

 The alternative is to print whatever we parsed when building expr. To
 do that we would need to have begin/end locations for expr, and then
 do a location_t-const char * translation and print whatever is
 between those two pointers:

 bug.c:4: error: called object '({break;})' is not a function
 

 Is it clear now? If so, I will update the wiki to put this example.

That is clear.  Thanks.  I personally would be perfectly happy if the
compiler said
bug.c:4.COLUMN: error: called object is not a function
That is, fixing the compiler to includes parts of the source code in
the error message itself is, for me, of considerably lower priority
than fixing the compiler to generate good column numbers.



  b) Re-open the file and fseek. This is not trivial since we need
 to do it fast but still do all character conversions that we
 did when libcpp opened it the first time. This is
 approximately what Clang (LLVM) does and it seems they can do
 it very fast by keeping a cache of buffers ever reopened. I
 think that thanks to our line-maps implementation, we can do
 the seeking quite more efficiently in terms of computation
 time.  However, opening files is quite embedded into CPP, so
 that would need to be factored out so we can avoid any
 unnecessary CPP stuff when reopening but still do it
 *properly* and *efficiently*.

 If we are going to reopen the file, then why do we need to record the
 locations in the preprocessed token stream?

 Because for some diagnostics we want to give the warnings in the
 instantiation point not in the macro definition point. Moreover, this
 is what we currently do, so if we don't want to change the current
 behaviour, we need to track both locations.

 Example

 /*header.h*/
 #pragma GCC system_header
 #define BIG  0x1b27da572ef3cd86ULL

 /* file.c */
 #include pr7263.h
 __extension__ unsigned long long
 bar ()
 {
   return BIG;
 }

 We print a diagnostic at file.c for the expansion of BIG. However,
 since we do not have the original location we cannot check that the
 token comes from a system header, and we do not suppress the warning.
 There are more subtle bugs that arise from not having the original
 location available. See PR36478.

 BTW, Clang takes into account both locations when printing diagnostics.

Perhaps I misunderstand what you mean by recording the location in the
preprocessed token stream.  You evidently do not mean getting column
numbers for the preprocessed code.  You mean that when a preprocessor
macro is expanded, we should record both the location where the macro
is used, and also some sort of reference to the macro so that we know
the location where the macro was defined.  Is that right?


 If we keep, for each source line, the file offset in the file of the
 start of that source line, then I think that printing the line from
 the source file would be pretty fast.  That would not be free but it
 would be much cheaper than keeping the entire input file.  Various

 Cheaper in terms of memory. It cannot be cheaper in terms of
 compilation time than a direct pointer to the already opened buffer
 for each line-map.

Except that we know that increased memory use leads to increased
compile time.  Diagnostic printing can't be slow, but it's also not on
the critical path.  Most code does not generate diagnostics.  So there
is a balance to be struck.

Ian


Re: Better GCC diagnostics

2008-08-15 Thread Manuel López-Ibáñez
2008/8/15 Ian Lance Taylor [EMAIL PROTECTED]:

 BTW, Clang takes into account both locations when printing diagnostics.

 Perhaps I misunderstand what you mean by recording the location in the
 preprocessed token stream.  You evidently do not mean getting column
 numbers for the preprocessed code.  You mean that when a preprocessor
 macro is expanded, we should record both the location where the macro
 is used, and also some sort of reference to the macro so that we know
 the location where the macro was defined.  Is that right?


I don't see the difference. We do currently assign to the preprocessed
code the location of the macro call. So we get column numbers to the
preprocessed code. In addition, we should record the location were the
macro was defined. In the previous example, we should be able to
obtain somehow two locations for 0x1b27da572ef3cd86ULL, one for its
position in header.h and another for its final position in file.c.
Whether we record two locations explicitly or we keep some lookup
table of macro expansions/definitions would depend on what is found to
be the most efficient implementation.

I don't know which approach Clang follows but in our case it would be
better to use the mapped-location infrastructure to track this for us
in a single source_location number.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-15 Thread Joseph S. Myers
On Fri, 15 Aug 2008, Ian Lance Taylor wrote:

 Perhaps I misunderstand what you mean by recording the location in the
 preprocessed token stream.  You evidently do not mean getting column
 numbers for the preprocessed code.  You mean that when a preprocessor
 macro is expanded, we should record both the location where the macro
 is used, and also some sort of reference to the macro so that we know
 the location where the macro was defined.  Is that right?

My use case for locations in preprocessed code is tracking down 
diagnostics arising in the expansions of complicated macros.  I find 
that's the main case where GCC's existing diagnostics are unhelpful; in 
that case I would like an option to show the relevant fragment of 
preprocessed source that causes the diagnostic.  There are also easy 
heuristics to decide whether the preprocessed source is likely to be more 
useful than the non-preprocessed source.

Indexes into a table of tokens would be one possibility; C++ already 
creates such a table in the course of lexing the whole input up front.

-- 
Joseph S. Myers
[EMAIL PROTECTED]