Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)

2015-09-25 Thread David Malcolm
On Fri, 2015-09-25 at 16:06 +0200, Dodji Seketeli wrote:
> Hello,
> 
> Similarly to a comment I made on the previous patch of the series,
> 
> +++ b/libcpp/include/line-map.h
> 
> [...]
> 
>  struct GTY(()) location_adhoc_data {
>source_location locus;
> +  source_range src_range;
>void * GTY((skip)) data;
>  };
> 
> Could we just change the type of locus and make it be a source_range
> instead?  With the right converting constructor (in the source_range
> class) that takes a source_location, the amount of churn should
> hopefully be minimized, or maybe I am missing something ...

Thanks.

I think that the above is one place where we *would* want both locus and
src_range.

One key idea within this patch is for source_location/location_t to be
able to track both a point/caret/locus and a range containing it.   The
idea is to stash the range information within the ad-hoc table.

For the most simple expressions involving just one token, locus ==
src_range.m_start, but in the most general case, locus,
src_range.m_start and src_range.m_finish are all different from each
other; consider this expression:

  foo && bar
  ^~

(i.e. where the caret/locus is at the first '&' and the range starts at
the 'f' of "foo" and finishes at the 'r' of "bar").

As noted elsewhere, we might try to pack short ranges directly into the
32 bits of source_location:
 https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html
which would avoid having to use ad-hoc for such short ranges; ideally
most tokens.  I'm experimenting with that (I don't have it fully working
yet).

> [...]
> 
> diff --git a/libcpp/line-map.c b/libcpp/line-map.c
> 
> [...]
> 
> +source_range
> +get_range_from_adhoc_loc (struct line_maps *set, source_location loc)
> +{
> 
> Please add a comment for this function.

(nods)

> [...]
> 
> diff --git a/gcc/tree.c b/gcc/tree.c
> 
> +void
> +set_source_range (tree *expr, location_t start, location_t finish)
> 
> Please add a comment for this function and its overloads.

(nods)

> [...]
> 
> +void
> +set_c_expr_source_range (c_expr *expr,
> +  location_t start, location_t finish)
> +{
> 
> Likewise.

(nods)

> +location_t
> +set_block (location_t loc, tree block)
> +{
> 
> Likewise.  I am wondering if we shouldn't even change the name of this
> function to better suit what it does: associate a tree to a location.

"associate_tree_with_location" ?

> +  source_range src_range;
> +  if (IS_ADHOC_LOC (loc))
> +/* FIXME: can we update in-place?  */
> +src_range = get_range_from_adhoc_loc (line_table, loc);
> 
> Hmmh, at the moment, I don't think we can update an entry of the adhoc
> map that associates {location, range, tree} as all three components
> contribute to the hash value of the entry.  A new triplet means a new
> entry.
> 
> My understanding is that initially the two elements of the pair
> {location, tree} were contributing to the hash value because the same
> location could very well belong to different blocks.

Ah; thanks.

> +  else
> +src_range = source_range::from_location (loc);
> +
> +  return COMBINE_LOCATION_DATA (line_table, loc, src_range, block);
> +}
> 
> [...]
> 
> @@ -6091,6 +6112,10 @@ c_parser_conditional_expression (c_parser *parser, 
> struct c_expr *after,
>  
>if (c_parser_next_token_is_not (parser, CPP_QUERY))
>  return cond;
> +  if (cond.value != error_mark_node)
> +start = cond.src_range.m_start;
> +  else
> +start = UNKNOWN_LOCATION;
> 
> I think that "getting the start range of a c_expr" is an operation
> that is generally useful; and it's also generally useful for that
> operation to handle cases where the tree carried by the c_expr can be
> an error_mark_node.  So maybe it would be appropriate to have a getter
> function for that operation and use it here instead.

Good idea; thanks.  That will be helpful as I try other representations.

> You would then use that operation in the other places of this patch
> that are getting c_expr::src_range.start -- by the way, those other
> places are not handling the error_mark_node case like the above.
> 
> [...]
> 
> +++ b/gcc/c/c-tree.h
> @@ -132,6 +132,9 @@ struct c_expr
>   The type of an enum constant is a plain integer type, but this
>   field will be the enum type.  */
>tree original_type;
> +
> +  /* FIXME.  */
> +  source_range src_range;
>  };
> 
> Why a FIXME here?

To remind myself before posting the patches that I need to give the
field a descriptive comment, explaining what purpose it serves.

Oops :)

It probably should read something like this:

  /* The source range of this C expression.  This might
 be thought of as redundant, since ranges for
 expressions can be stored in a location_t, but
 not all tree nodes in expressions have a location_t.

 Consider this source code:  

int test (int foo)
{
  return foo * 100;
^^^   ^^^
   }

During C parsing, the ranges for "foo" and 

Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)

2015-09-25 Thread Dodji Seketeli
David Malcolm  a écrit:

> On Fri, 2015-09-25 at 16:06 +0200, Dodji Seketeli wrote:
>> Hello,
>> 
>> Similarly to a comment I made on the previous patch of the series,
>> 
>> +++ b/libcpp/include/line-map.h
>> 
>> [...]
>> 
>>  struct GTY(()) location_adhoc_data {
>>source_location locus;
>> +  source_range src_range;
>>void * GTY((skip)) data;
>>  };
>> 
>> Could we just change the type of locus and make it be a source_range
>> instead?  With the right converting constructor (in the source_range
>> class) that takes a source_location, the amount of churn should
>> hopefully be minimized, or maybe I am missing something ...
>
> Thanks.
>
> I think that the above is one place where we *would* want both locus and
> src_range.

Right, as opposed to the token case where conceptually, all we need is
the begining and the end of the token.  My confusion.

[...]

> As noted elsewhere, we might try to pack short ranges directly into the
> 32 bits of source_location:
>  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html
> which would avoid having to use ad-hoc for such short ranges; ideally
> most tokens.  I'm experimenting with that (I don't have it fully working
> yet).

Right.  My personal inclination would be to make the general case of
storing all ranges in this adhoc data structure, or, even, into another
on-the-side data structure, similar to the adhoc one, but dedicated to
range associated to points, as you see fit.

Then when that works, consider the optimization of stuffing short ranges
into the 32 bits of source_location, depending on the memory profiles we
get.  But it's your call :-)

[...]

>> +location_t
>> +set_block (location_t loc, tree block)
>> +{
>> 
>> Likewise.  I am wondering if we shouldn't even change the name of this
>> function to better suit what it does: associate a tree to a location.
>
> "associate_tree_with_location" ?

If you find it cool, I am cool :-)

[...]

>> +++ b/gcc/c/c-tree.h
>> @@ -132,6 +132,9 @@ struct c_expr
>>   The type of an enum constant is a plain integer type, but this
>>   field will be the enum type.  */
>>tree original_type;
>> +
>> +  /* FIXME.  */
>> +  source_range src_range;
>>  };
>> 
>> Why a FIXME here?
>
> To remind myself before posting the patches that I need to give the
> field a descriptive comment, explaining what purpose it serves.
>
> Oops :)
>
> It probably should read something like this:
>
>   /* The source range of this C expression.  This might
>  be thought of as redundant, since ranges for
>  expressions can be stored in a location_t, but
>  not all tree nodes in expressions have a location_t.
>
>  Consider this source code:  
>
>   int test (int foo)
>   {
> return foo * 100;
> ^^^   ^^^
>}
>
> During C parsing, the ranges for "foo" and "100" are
> stored within this field of c_expr, but after parsing
> to GENERIC, all we have is a VAR_DECL and an
> INTEGER_CST (the former's location is in at the top of the
> function, and the latter has no location).
>
> For consistency, we store ranges for all expressions
> in this field, not just those that don't have a
> location_t. */
>   source_range src_range;

Great, thanks.

[...]

> [BTW, I'm about to disappear on a vacation from tomorrow until October
> 6th, and will be away from email and computers]

Thanks for the heads-up.

Cheers,

-- 
Dodji


Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)

2015-09-25 Thread Dodji Seketeli
Hello,

Similarly to a comment I made on the previous patch of the series,

+++ b/libcpp/include/line-map.h

[...]

 struct GTY(()) location_adhoc_data {
   source_location locus;
+  source_range src_range;
   void * GTY((skip)) data;
 };

Could we just change the type of locus and make it be a source_range
instead?  With the right converting constructor (in the source_range
class) that takes a source_location, the amount of churn should
hopefully be minimized, or maybe I am missing something ...

[...]

diff --git a/libcpp/line-map.c b/libcpp/line-map.c

[...]

+source_range
+get_range_from_adhoc_loc (struct line_maps *set, source_location loc)
+{

Please add a comment for this function.

[...]

diff --git a/gcc/tree.c b/gcc/tree.c

+void
+set_source_range (tree *expr, location_t start, location_t finish)

Please add a comment for this function and its overloads.

[...]

+void
+set_c_expr_source_range (c_expr *expr,
+location_t start, location_t finish)
+{

Likewise.

+location_t
+set_block (location_t loc, tree block)
+{

Likewise.  I am wondering if we shouldn't even change the name of this
function to better suit what it does: associate a tree to a location.

+  source_range src_range;
+  if (IS_ADHOC_LOC (loc))
+/* FIXME: can we update in-place?  */
+src_range = get_range_from_adhoc_loc (line_table, loc);

Hmmh, at the moment, I don't think we can update an entry of the adhoc
map that associates {location, range, tree} as all three components
contribute to the hash value of the entry.  A new triplet means a new
entry.

My understanding is that initially the two elements of the pair
{location, tree} were contributing to the hash value because the same
location could very well belong to different blocks.

+  else
+src_range = source_range::from_location (loc);
+
+  return COMBINE_LOCATION_DATA (line_table, loc, src_range, block);
+}

[...]

@@ -6091,6 +6112,10 @@ c_parser_conditional_expression (c_parser *parser, 
struct c_expr *after,
 
   if (c_parser_next_token_is_not (parser, CPP_QUERY))
 return cond;
+  if (cond.value != error_mark_node)
+start = cond.src_range.m_start;
+  else
+start = UNKNOWN_LOCATION;

I think that "getting the start range of a c_expr" is an operation
that is generally useful; and it's also generally useful for that
operation to handle cases where the tree carried by the c_expr can be
an error_mark_node.  So maybe it would be appropriate to have a getter
function for that operation and use it here instead.

You would then use that operation in the other places of this patch
that are getting c_expr::src_range.start -- by the way, those other
places are not handling the error_mark_node case like the above.

[...]

+++ b/gcc/c/c-tree.h
@@ -132,6 +132,9 @@ struct c_expr
  The type of an enum constant is a plain integer type, but this
  field will be the enum type.  */
   tree original_type;
+
+  /* FIXME.  */
+  source_range src_range;
 };

Why a FIXME here?

+#define CAN_HAVE_RANGE_P(NODE) (CAN_HAVE_LOCATION_P (NODE))

Please add a comment for this.

+#define EXPR_LOCATION_RANGE(NODE) (get_expr_source_range (EXPR_CHECK ((NODE

Likewise.

+#define EXPR_HAS_RANGE(NODE) \
+(CAN_HAVE_RANGE_P (NODE) \
+ ? EXPR_LOCATION_RANGE (NODE).m_start != UNKNOWN_LOCATION \
+ : false)
+

Likewise.

[...]
 
+static inline source_range
+get_expr_source_range (tree expr)

Likewise.

+#define DECL_LOCATION_RANGE(NODE) \
+  (get_decl_source_range (DECL_MINIMAL_CHECK (NODE)))
+

Likewise.

+static inline source_range
+get_decl_source_range (tree decl)
+{

Likewise.

-- 
Dodji


[PATCH 4/5] Implement tree expression tracking in C FE (v2)

2015-09-22 Thread David Malcolm
This is a combination of various patches from v1 of the kit,
including:
  12/22: Add source-ranges for trees
  13/22: gcc-rich-location.[ch]: add methods for working with tree ranges
  14/22: C: capture tree ranges for various expressions

The implementation of how ranges are stored has completely changed
since v1 of the kit.  Rather than introducing a SOURCE_RANGE tree node
and adding fields to decl and expr, the patch now captures ranges
for all C expressions during parsing within a new field of c_expr,
and for all tree nodes with a location_t, it stores them in
ad-hoc locations for later use.

Hence compound expressions get ranges; see:
  
https://dmalcolm.fedorapeople.org/gcc/2015-09-22/diagnostic-test-expressions-1.html

and for this example:

  int test (int foo)
  {
return foo * 100;
   ^^^   ^^^
  }

we have access to the ranges of "foo" and "100" during C parsing via
the c_expr, but once we have GENERIC, all we have is a VAR_DECL and an
INTEGER_CST (the former's location is in at the top of the
function, and the latter has no location).

This restriction means that I had to remove various expressions from
diagnostic-test-expressions-1.c; specifically:
  test_global
  test_param
  test_local
  test_integer_constants
  test_character_constants
  test_floating_constants
  test_enumeration_constant
  test_string_literal
  test_unary_plus
  test_sizeof

There are still some FIXMEs in here that probably need addressing.

gcc/ChangeLog:
* Makefile.in (OBJS): Add gcc-rich-location.o.
* gcc-rich-location.c: New file.
* gcc-rich-location.h: New file.
* gimple.h (gimple_set_block): Use "set_block".
* print-tree.c (print_node): Print any source range information.
* tree-cfg.c (move_block_to_fn): Use "set_block".
(move_block_to_fn): Likewise.
* tree-inline.c (copy_phis_for_bb): Likewise.
* tree.c (tree_set_block): Use "set_block".
(set_source_range): New functions.
(set_block): New function.
* tree.h (CAN_HAVE_RANGE_P): New.
(EXPR_LOCATION_RANGE): New.
(EXPR_HAS_RANGE): New.
(get_expr_source_range): New inline function.
(DECL_LOCATION_RANGE): New.
(set_source_range): New decls.
(set_block): New decl.
(get_decl_source_range): New inline function.

gcc/c-family/ChangeLog:
* c-common.c (c_fully_fold_internal): Capture existing souce_range,
and store it on the result.

gcc/c/ChangeLog:
* c-parser.c (set_c_expr_source_range): New functions.
(c_parser_expr_no_commas): Call set_c_expr_source_range on the ret
based on the range from the start of the LHS to the end of the
RHS.
(c_parser_conditional_expression): Likewise, based on the range
from the start of the cond.value to the end of exp2.value.
(c_parser_binary_expression): Call set_c_expr_source_range on
the stack values for TRUTH_ANDIF_EXPR and TRUTH_ORIF_EXPR.
(c_parser_cast_expression): Call set_c_expr_source_range on ret
based on the cast_loc through to the end of the expr.
(c_parser_unary_expression): Likewise, based on the
op_loc through to the end of op.
(c_parser_sizeof_expression) Likewise, based on the start of the
sizeof token through to either the closing paren or the end of
expr.
(c_parser_postfix_expression): Likewise, using the token range,
or from the open paren through to the close paren for
parenthesized expressions.
(c_parser_postfix_expression_after_primary): Likewise, for
various kinds of expression.
* c-tree.h (struct c_expr): Add field "src_range".
(set_c_expr_source_range): New decls.
* c-typeck.c (parser_build_unary_op): Call set_c_expr_source_range
on ret for prefix unary ops.
(parser_build_binary_op): Likewise, running from the start of
arg1.value through to the end of arg2.value.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-expressions-1.c: New file.
* gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c:
New file.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
diagnostic_plugin_test_tree_expression_range.c and
diagnostic-test-expressions-1.c.

libcpp/ChangeLog:
* include/line-map.h (location_adhoc_data): Add field "src_range".
(get_combined_adhoc_loc): Add source_range param.
(get_range_from_adhoc_loc): New decl.
(COMBINE_LOCATION_DATA): Add  source_range param.
* line-map.c (location_adhoc_data_hash): Contribute the src_range
start and finish to the hash value.
(location_adhoc_data_eq): Require that the src_range values be
equal.
(get_combined_adhoc_loc): Add source_range param and store it.
Remove the requirement that "data" be non-NULL.
(get_range_from_adhoc_loc): New function.