Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)
On Fri, 2015-09-25 at 16:06 +0200, Dodji Seketeli wrote: > Hello, > > Similarly to a comment I made on the previous patch of the series, > > +++ b/libcpp/include/line-map.h > > [...] > > struct GTY(()) location_adhoc_data { >source_location locus; > + source_range src_range; >void * GTY((skip)) data; > }; > > Could we just change the type of locus and make it be a source_range > instead? With the right converting constructor (in the source_range > class) that takes a source_location, the amount of churn should > hopefully be minimized, or maybe I am missing something ... Thanks. I think that the above is one place where we *would* want both locus and src_range. One key idea within this patch is for source_location/location_t to be able to track both a point/caret/locus and a range containing it. The idea is to stash the range information within the ad-hoc table. For the most simple expressions involving just one token, locus == src_range.m_start, but in the most general case, locus, src_range.m_start and src_range.m_finish are all different from each other; consider this expression: foo && bar ^~ (i.e. where the caret/locus is at the first '&' and the range starts at the 'f' of "foo" and finishes at the 'r' of "bar"). As noted elsewhere, we might try to pack short ranges directly into the 32 bits of source_location: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html which would avoid having to use ad-hoc for such short ranges; ideally most tokens. I'm experimenting with that (I don't have it fully working yet). > [...] > > diff --git a/libcpp/line-map.c b/libcpp/line-map.c > > [...] > > +source_range > +get_range_from_adhoc_loc (struct line_maps *set, source_location loc) > +{ > > Please add a comment for this function. (nods) > [...] > > diff --git a/gcc/tree.c b/gcc/tree.c > > +void > +set_source_range (tree *expr, location_t start, location_t finish) > > Please add a comment for this function and its overloads. (nods) > [...] > > +void > +set_c_expr_source_range (c_expr *expr, > + location_t start, location_t finish) > +{ > > Likewise. (nods) > +location_t > +set_block (location_t loc, tree block) > +{ > > Likewise. I am wondering if we shouldn't even change the name of this > function to better suit what it does: associate a tree to a location. "associate_tree_with_location" ? > + source_range src_range; > + if (IS_ADHOC_LOC (loc)) > +/* FIXME: can we update in-place? */ > +src_range = get_range_from_adhoc_loc (line_table, loc); > > Hmmh, at the moment, I don't think we can update an entry of the adhoc > map that associates {location, range, tree} as all three components > contribute to the hash value of the entry. A new triplet means a new > entry. > > My understanding is that initially the two elements of the pair > {location, tree} were contributing to the hash value because the same > location could very well belong to different blocks. Ah; thanks. > + else > +src_range = source_range::from_location (loc); > + > + return COMBINE_LOCATION_DATA (line_table, loc, src_range, block); > +} > > [...] > > @@ -6091,6 +6112,10 @@ c_parser_conditional_expression (c_parser *parser, > struct c_expr *after, > >if (c_parser_next_token_is_not (parser, CPP_QUERY)) > return cond; > + if (cond.value != error_mark_node) > +start = cond.src_range.m_start; > + else > +start = UNKNOWN_LOCATION; > > I think that "getting the start range of a c_expr" is an operation > that is generally useful; and it's also generally useful for that > operation to handle cases where the tree carried by the c_expr can be > an error_mark_node. So maybe it would be appropriate to have a getter > function for that operation and use it here instead. Good idea; thanks. That will be helpful as I try other representations. > You would then use that operation in the other places of this patch > that are getting c_expr::src_range.start -- by the way, those other > places are not handling the error_mark_node case like the above. > > [...] > > +++ b/gcc/c/c-tree.h > @@ -132,6 +132,9 @@ struct c_expr > The type of an enum constant is a plain integer type, but this > field will be the enum type. */ >tree original_type; > + > + /* FIXME. */ > + source_range src_range; > }; > > Why a FIXME here? To remind myself before posting the patches that I need to give the field a descriptive comment, explaining what purpose it serves. Oops :) It probably should read something like this: /* The source range of this C expression. This might be thought of as redundant, since ranges for expressions can be stored in a location_t, but not all tree nodes in expressions have a location_t. Consider this source code: int test (int foo) { return foo * 100; ^^^ ^^^ } During C parsing, the ranges for "foo" and
Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)
David Malcolma écrit: > On Fri, 2015-09-25 at 16:06 +0200, Dodji Seketeli wrote: >> Hello, >> >> Similarly to a comment I made on the previous patch of the series, >> >> +++ b/libcpp/include/line-map.h >> >> [...] >> >> struct GTY(()) location_adhoc_data { >>source_location locus; >> + source_range src_range; >>void * GTY((skip)) data; >> }; >> >> Could we just change the type of locus and make it be a source_range >> instead? With the right converting constructor (in the source_range >> class) that takes a source_location, the amount of churn should >> hopefully be minimized, or maybe I am missing something ... > > Thanks. > > I think that the above is one place where we *would* want both locus and > src_range. Right, as opposed to the token case where conceptually, all we need is the begining and the end of the token. My confusion. [...] > As noted elsewhere, we might try to pack short ranges directly into the > 32 bits of source_location: > https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html > which would avoid having to use ad-hoc for such short ranges; ideally > most tokens. I'm experimenting with that (I don't have it fully working > yet). Right. My personal inclination would be to make the general case of storing all ranges in this adhoc data structure, or, even, into another on-the-side data structure, similar to the adhoc one, but dedicated to range associated to points, as you see fit. Then when that works, consider the optimization of stuffing short ranges into the 32 bits of source_location, depending on the memory profiles we get. But it's your call :-) [...] >> +location_t >> +set_block (location_t loc, tree block) >> +{ >> >> Likewise. I am wondering if we shouldn't even change the name of this >> function to better suit what it does: associate a tree to a location. > > "associate_tree_with_location" ? If you find it cool, I am cool :-) [...] >> +++ b/gcc/c/c-tree.h >> @@ -132,6 +132,9 @@ struct c_expr >> The type of an enum constant is a plain integer type, but this >> field will be the enum type. */ >>tree original_type; >> + >> + /* FIXME. */ >> + source_range src_range; >> }; >> >> Why a FIXME here? > > To remind myself before posting the patches that I need to give the > field a descriptive comment, explaining what purpose it serves. > > Oops :) > > It probably should read something like this: > > /* The source range of this C expression. This might > be thought of as redundant, since ranges for > expressions can be stored in a location_t, but > not all tree nodes in expressions have a location_t. > > Consider this source code: > > int test (int foo) > { > return foo * 100; > ^^^ ^^^ >} > > During C parsing, the ranges for "foo" and "100" are > stored within this field of c_expr, but after parsing > to GENERIC, all we have is a VAR_DECL and an > INTEGER_CST (the former's location is in at the top of the > function, and the latter has no location). > > For consistency, we store ranges for all expressions > in this field, not just those that don't have a > location_t. */ > source_range src_range; Great, thanks. [...] > [BTW, I'm about to disappear on a vacation from tomorrow until October > 6th, and will be away from email and computers] Thanks for the heads-up. Cheers, -- Dodji
Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)
Hello, Similarly to a comment I made on the previous patch of the series, +++ b/libcpp/include/line-map.h [...] struct GTY(()) location_adhoc_data { source_location locus; + source_range src_range; void * GTY((skip)) data; }; Could we just change the type of locus and make it be a source_range instead? With the right converting constructor (in the source_range class) that takes a source_location, the amount of churn should hopefully be minimized, or maybe I am missing something ... [...] diff --git a/libcpp/line-map.c b/libcpp/line-map.c [...] +source_range +get_range_from_adhoc_loc (struct line_maps *set, source_location loc) +{ Please add a comment for this function. [...] diff --git a/gcc/tree.c b/gcc/tree.c +void +set_source_range (tree *expr, location_t start, location_t finish) Please add a comment for this function and its overloads. [...] +void +set_c_expr_source_range (c_expr *expr, +location_t start, location_t finish) +{ Likewise. +location_t +set_block (location_t loc, tree block) +{ Likewise. I am wondering if we shouldn't even change the name of this function to better suit what it does: associate a tree to a location. + source_range src_range; + if (IS_ADHOC_LOC (loc)) +/* FIXME: can we update in-place? */ +src_range = get_range_from_adhoc_loc (line_table, loc); Hmmh, at the moment, I don't think we can update an entry of the adhoc map that associates {location, range, tree} as all three components contribute to the hash value of the entry. A new triplet means a new entry. My understanding is that initially the two elements of the pair {location, tree} were contributing to the hash value because the same location could very well belong to different blocks. + else +src_range = source_range::from_location (loc); + + return COMBINE_LOCATION_DATA (line_table, loc, src_range, block); +} [...] @@ -6091,6 +6112,10 @@ c_parser_conditional_expression (c_parser *parser, struct c_expr *after, if (c_parser_next_token_is_not (parser, CPP_QUERY)) return cond; + if (cond.value != error_mark_node) +start = cond.src_range.m_start; + else +start = UNKNOWN_LOCATION; I think that "getting the start range of a c_expr" is an operation that is generally useful; and it's also generally useful for that operation to handle cases where the tree carried by the c_expr can be an error_mark_node. So maybe it would be appropriate to have a getter function for that operation and use it here instead. You would then use that operation in the other places of this patch that are getting c_expr::src_range.start -- by the way, those other places are not handling the error_mark_node case like the above. [...] +++ b/gcc/c/c-tree.h @@ -132,6 +132,9 @@ struct c_expr The type of an enum constant is a plain integer type, but this field will be the enum type. */ tree original_type; + + /* FIXME. */ + source_range src_range; }; Why a FIXME here? +#define CAN_HAVE_RANGE_P(NODE) (CAN_HAVE_LOCATION_P (NODE)) Please add a comment for this. +#define EXPR_LOCATION_RANGE(NODE) (get_expr_source_range (EXPR_CHECK ((NODE Likewise. +#define EXPR_HAS_RANGE(NODE) \ +(CAN_HAVE_RANGE_P (NODE) \ + ? EXPR_LOCATION_RANGE (NODE).m_start != UNKNOWN_LOCATION \ + : false) + Likewise. [...] +static inline source_range +get_expr_source_range (tree expr) Likewise. +#define DECL_LOCATION_RANGE(NODE) \ + (get_decl_source_range (DECL_MINIMAL_CHECK (NODE))) + Likewise. +static inline source_range +get_decl_source_range (tree decl) +{ Likewise. -- Dodji
[PATCH 4/5] Implement tree expression tracking in C FE (v2)
This is a combination of various patches from v1 of the kit, including: 12/22: Add source-ranges for trees 13/22: gcc-rich-location.[ch]: add methods for working with tree ranges 14/22: C: capture tree ranges for various expressions The implementation of how ranges are stored has completely changed since v1 of the kit. Rather than introducing a SOURCE_RANGE tree node and adding fields to decl and expr, the patch now captures ranges for all C expressions during parsing within a new field of c_expr, and for all tree nodes with a location_t, it stores them in ad-hoc locations for later use. Hence compound expressions get ranges; see: https://dmalcolm.fedorapeople.org/gcc/2015-09-22/diagnostic-test-expressions-1.html and for this example: int test (int foo) { return foo * 100; ^^^ ^^^ } we have access to the ranges of "foo" and "100" during C parsing via the c_expr, but once we have GENERIC, all we have is a VAR_DECL and an INTEGER_CST (the former's location is in at the top of the function, and the latter has no location). This restriction means that I had to remove various expressions from diagnostic-test-expressions-1.c; specifically: test_global test_param test_local test_integer_constants test_character_constants test_floating_constants test_enumeration_constant test_string_literal test_unary_plus test_sizeof There are still some FIXMEs in here that probably need addressing. gcc/ChangeLog: * Makefile.in (OBJS): Add gcc-rich-location.o. * gcc-rich-location.c: New file. * gcc-rich-location.h: New file. * gimple.h (gimple_set_block): Use "set_block". * print-tree.c (print_node): Print any source range information. * tree-cfg.c (move_block_to_fn): Use "set_block". (move_block_to_fn): Likewise. * tree-inline.c (copy_phis_for_bb): Likewise. * tree.c (tree_set_block): Use "set_block". (set_source_range): New functions. (set_block): New function. * tree.h (CAN_HAVE_RANGE_P): New. (EXPR_LOCATION_RANGE): New. (EXPR_HAS_RANGE): New. (get_expr_source_range): New inline function. (DECL_LOCATION_RANGE): New. (set_source_range): New decls. (set_block): New decl. (get_decl_source_range): New inline function. gcc/c-family/ChangeLog: * c-common.c (c_fully_fold_internal): Capture existing souce_range, and store it on the result. gcc/c/ChangeLog: * c-parser.c (set_c_expr_source_range): New functions. (c_parser_expr_no_commas): Call set_c_expr_source_range on the ret based on the range from the start of the LHS to the end of the RHS. (c_parser_conditional_expression): Likewise, based on the range from the start of the cond.value to the end of exp2.value. (c_parser_binary_expression): Call set_c_expr_source_range on the stack values for TRUTH_ANDIF_EXPR and TRUTH_ORIF_EXPR. (c_parser_cast_expression): Call set_c_expr_source_range on ret based on the cast_loc through to the end of the expr. (c_parser_unary_expression): Likewise, based on the op_loc through to the end of op. (c_parser_sizeof_expression) Likewise, based on the start of the sizeof token through to either the closing paren or the end of expr. (c_parser_postfix_expression): Likewise, using the token range, or from the open paren through to the close paren for parenthesized expressions. (c_parser_postfix_expression_after_primary): Likewise, for various kinds of expression. * c-tree.h (struct c_expr): Add field "src_range". (set_c_expr_source_range): New decls. * c-typeck.c (parser_build_unary_op): Call set_c_expr_source_range on ret for prefix unary ops. (parser_build_binary_op): Likewise, running from the start of arg1.value through to the end of arg2.value. gcc/testsuite/ChangeLog: * gcc.dg/plugin/diagnostic-test-expressions-1.c: New file. * gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c: New file. * gcc.dg/plugin/plugin.exp (plugin_test_list): Add diagnostic_plugin_test_tree_expression_range.c and diagnostic-test-expressions-1.c. libcpp/ChangeLog: * include/line-map.h (location_adhoc_data): Add field "src_range". (get_combined_adhoc_loc): Add source_range param. (get_range_from_adhoc_loc): New decl. (COMBINE_LOCATION_DATA): Add source_range param. * line-map.c (location_adhoc_data_hash): Contribute the src_range start and finish to the hash value. (location_adhoc_data_eq): Require that the src_range values be equal. (get_combined_adhoc_loc): Add source_range param and store it. Remove the requirement that "data" be non-NULL. (get_range_from_adhoc_loc): New function.