Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)

2015-09-25 Thread Dodji Seketeli
David Malcolm  a écrit:

> On Fri, 2015-09-25 at 16:06 +0200, Dodji Seketeli wrote:
>> Hello,
>> 
>> Similarly to a comment I made on the previous patch of the series,
>> 
>> +++ b/libcpp/include/line-map.h
>> 
>> [...]
>> 
>>  struct GTY(()) location_adhoc_data {
>>source_location locus;
>> +  source_range src_range;
>>void * GTY((skip)) data;
>>  };
>> 
>> Could we just change the type of locus and make it be a source_range
>> instead?  With the right converting constructor (in the source_range
>> class) that takes a source_location, the amount of churn should
>> hopefully be minimized, or maybe I am missing something ...
>
> Thanks.
>
> I think that the above is one place where we *would* want both locus and
> src_range.

Right, as opposed to the token case where conceptually, all we need is
the begining and the end of the token.  My confusion.

[...]

> As noted elsewhere, we might try to pack short ranges directly into the
> 32 bits of source_location:
>  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html
> which would avoid having to use ad-hoc for such short ranges; ideally
> most tokens.  I'm experimenting with that (I don't have it fully working
> yet).

Right.  My personal inclination would be to make the general case of
storing all ranges in this adhoc data structure, or, even, into another
on-the-side data structure, similar to the adhoc one, but dedicated to
range associated to points, as you see fit.

Then when that works, consider the optimization of stuffing short ranges
into the 32 bits of source_location, depending on the memory profiles we
get.  But it's your call :-)

[...]

>> +location_t
>> +set_block (location_t loc, tree block)
>> +{
>> 
>> Likewise.  I am wondering if we shouldn't even change the name of this
>> function to better suit what it does: associate a tree to a location.
>
> "associate_tree_with_location" ?

If you find it cool, I am cool :-)

[...]

>> +++ b/gcc/c/c-tree.h
>> @@ -132,6 +132,9 @@ struct c_expr
>>   The type of an enum constant is a plain integer type, but this
>>   field will be the enum type.  */
>>tree original_type;
>> +
>> +  /* FIXME.  */
>> +  source_range src_range;
>>  };
>> 
>> Why a FIXME here?
>
> To remind myself before posting the patches that I need to give the
> field a descriptive comment, explaining what purpose it serves.
>
> Oops :)
>
> It probably should read something like this:
>
>   /* The source range of this C expression.  This might
>  be thought of as redundant, since ranges for
>  expressions can be stored in a location_t, but
>  not all tree nodes in expressions have a location_t.
>
>  Consider this source code:  
>
>   int test (int foo)
>   {
> return foo * 100;
> ^^^   ^^^
>}
>
> During C parsing, the ranges for "foo" and "100" are
> stored within this field of c_expr, but after parsing
> to GENERIC, all we have is a VAR_DECL and an
> INTEGER_CST (the former's location is in at the top of the
> function, and the latter has no location).
>
> For consistency, we store ranges for all expressions
> in this field, not just those that don't have a
> location_t. */
>   source_range src_range;

Great, thanks.

[...]

> [BTW, I'm about to disappear on a vacation from tomorrow until October
> 6th, and will be away from email and computers]

Thanks for the heads-up.

Cheers,

-- 
Dodji


Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)

2015-09-25 Thread David Malcolm
On Fri, 2015-09-25 at 16:06 +0200, Dodji Seketeli wrote:
> Hello,
> 
> Similarly to a comment I made on the previous patch of the series,
> 
> +++ b/libcpp/include/line-map.h
> 
> [...]
> 
>  struct GTY(()) location_adhoc_data {
>source_location locus;
> +  source_range src_range;
>void * GTY((skip)) data;
>  };
> 
> Could we just change the type of locus and make it be a source_range
> instead?  With the right converting constructor (in the source_range
> class) that takes a source_location, the amount of churn should
> hopefully be minimized, or maybe I am missing something ...

Thanks.

I think that the above is one place where we *would* want both locus and
src_range.

One key idea within this patch is for source_location/location_t to be
able to track both a point/caret/locus and a range containing it.   The
idea is to stash the range information within the ad-hoc table.

For the most simple expressions involving just one token, locus ==
src_range.m_start, but in the most general case, locus,
src_range.m_start and src_range.m_finish are all different from each
other; consider this expression:

  foo && bar
  ^~

(i.e. where the caret/locus is at the first '&' and the range starts at
the 'f' of "foo" and finishes at the 'r' of "bar").

As noted elsewhere, we might try to pack short ranges directly into the
32 bits of source_location:
 https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html
which would avoid having to use ad-hoc for such short ranges; ideally
most tokens.  I'm experimenting with that (I don't have it fully working
yet).

> [...]
> 
> diff --git a/libcpp/line-map.c b/libcpp/line-map.c
> 
> [...]
> 
> +source_range
> +get_range_from_adhoc_loc (struct line_maps *set, source_location loc)
> +{
> 
> Please add a comment for this function.

(nods)

> [...]
> 
> diff --git a/gcc/tree.c b/gcc/tree.c
> 
> +void
> +set_source_range (tree *expr, location_t start, location_t finish)
> 
> Please add a comment for this function and its overloads.

(nods)

> [...]
> 
> +void
> +set_c_expr_source_range (c_expr *expr,
> +  location_t start, location_t finish)
> +{
> 
> Likewise.

(nods)

> +location_t
> +set_block (location_t loc, tree block)
> +{
> 
> Likewise.  I am wondering if we shouldn't even change the name of this
> function to better suit what it does: associate a tree to a location.

"associate_tree_with_location" ?

> +  source_range src_range;
> +  if (IS_ADHOC_LOC (loc))
> +/* FIXME: can we update in-place?  */
> +src_range = get_range_from_adhoc_loc (line_table, loc);
> 
> Hmmh, at the moment, I don't think we can update an entry of the adhoc
> map that associates {location, range, tree} as all three components
> contribute to the hash value of the entry.  A new triplet means a new
> entry.
> 
> My understanding is that initially the two elements of the pair
> {location, tree} were contributing to the hash value because the same
> location could very well belong to different blocks.

Ah; thanks.

> +  else
> +src_range = source_range::from_location (loc);
> +
> +  return COMBINE_LOCATION_DATA (line_table, loc, src_range, block);
> +}
> 
> [...]
> 
> @@ -6091,6 +6112,10 @@ c_parser_conditional_expression (c_parser *parser, 
> struct c_expr *after,
>  
>if (c_parser_next_token_is_not (parser, CPP_QUERY))
>  return cond;
> +  if (cond.value != error_mark_node)
> +start = cond.src_range.m_start;
> +  else
> +start = UNKNOWN_LOCATION;
> 
> I think that "getting the start range of a c_expr" is an operation
> that is generally useful; and it's also generally useful for that
> operation to handle cases where the tree carried by the c_expr can be
> an error_mark_node.  So maybe it would be appropriate to have a getter
> function for that operation and use it here instead.

Good idea; thanks.  That will be helpful as I try other representations.

> You would then use that operation in the other places of this patch
> that are getting c_expr::src_range.start -- by the way, those other
> places are not handling the error_mark_node case like the above.
> 
> [...]
> 
> +++ b/gcc/c/c-tree.h
> @@ -132,6 +132,9 @@ struct c_expr
>   The type of an enum constant is a plain integer type, but this
>   field will be the enum type.  */
>tree original_type;
> +
> +  /* FIXME.  */
> +  source_range src_range;
>  };
> 
> Why a FIXME here?

To remind myself before posting the patches that I need to give the
field a descriptive comment, explaining what purpose it serves.

Oops :)

It probably should read something like this:

  /* The source range of this C expression.  This might
 be thought of as redundant, since ranges for
 expressions can be stored in a location_t, but
 not all tree nodes in expressions have a location_t.

 Consider this source code:  

int test (int foo)
{
  return foo * 100;
^^^   ^^^
   }

During C parsing, the ranges for "foo" and "10

Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)

2015-09-25 Thread Dodji Seketeli
Hello,

Similarly to a comment I made on the previous patch of the series,

+++ b/libcpp/include/line-map.h

[...]

 struct GTY(()) location_adhoc_data {
   source_location locus;
+  source_range src_range;
   void * GTY((skip)) data;
 };

Could we just change the type of locus and make it be a source_range
instead?  With the right converting constructor (in the source_range
class) that takes a source_location, the amount of churn should
hopefully be minimized, or maybe I am missing something ...

[...]

diff --git a/libcpp/line-map.c b/libcpp/line-map.c

[...]

+source_range
+get_range_from_adhoc_loc (struct line_maps *set, source_location loc)
+{

Please add a comment for this function.

[...]

diff --git a/gcc/tree.c b/gcc/tree.c

+void
+set_source_range (tree *expr, location_t start, location_t finish)

Please add a comment for this function and its overloads.

[...]

+void
+set_c_expr_source_range (c_expr *expr,
+location_t start, location_t finish)
+{

Likewise.

+location_t
+set_block (location_t loc, tree block)
+{

Likewise.  I am wondering if we shouldn't even change the name of this
function to better suit what it does: associate a tree to a location.

+  source_range src_range;
+  if (IS_ADHOC_LOC (loc))
+/* FIXME: can we update in-place?  */
+src_range = get_range_from_adhoc_loc (line_table, loc);

Hmmh, at the moment, I don't think we can update an entry of the adhoc
map that associates {location, range, tree} as all three components
contribute to the hash value of the entry.  A new triplet means a new
entry.

My understanding is that initially the two elements of the pair
{location, tree} were contributing to the hash value because the same
location could very well belong to different blocks.

+  else
+src_range = source_range::from_location (loc);
+
+  return COMBINE_LOCATION_DATA (line_table, loc, src_range, block);
+}

[...]

@@ -6091,6 +6112,10 @@ c_parser_conditional_expression (c_parser *parser, 
struct c_expr *after,
 
   if (c_parser_next_token_is_not (parser, CPP_QUERY))
 return cond;
+  if (cond.value != error_mark_node)
+start = cond.src_range.m_start;
+  else
+start = UNKNOWN_LOCATION;

I think that "getting the start range of a c_expr" is an operation
that is generally useful; and it's also generally useful for that
operation to handle cases where the tree carried by the c_expr can be
an error_mark_node.  So maybe it would be appropriate to have a getter
function for that operation and use it here instead.

You would then use that operation in the other places of this patch
that are getting c_expr::src_range.start -- by the way, those other
places are not handling the error_mark_node case like the above.

[...]

+++ b/gcc/c/c-tree.h
@@ -132,6 +132,9 @@ struct c_expr
  The type of an enum constant is a plain integer type, but this
  field will be the enum type.  */
   tree original_type;
+
+  /* FIXME.  */
+  source_range src_range;
 };

Why a FIXME here?

+#define CAN_HAVE_RANGE_P(NODE) (CAN_HAVE_LOCATION_P (NODE))

Please add a comment for this.

+#define EXPR_LOCATION_RANGE(NODE) (get_expr_source_range (EXPR_CHECK ((NODE

Likewise.

+#define EXPR_HAS_RANGE(NODE) \
+(CAN_HAVE_RANGE_P (NODE) \
+ ? EXPR_LOCATION_RANGE (NODE).m_start != UNKNOWN_LOCATION \
+ : false)
+

Likewise.

[...]
 
+static inline source_range
+get_expr_source_range (tree expr)

Likewise.

+#define DECL_LOCATION_RANGE(NODE) \
+  (get_decl_source_range (DECL_MINIMAL_CHECK (NODE)))
+

Likewise.

+static inline source_range
+get_decl_source_range (tree decl)
+{

Likewise.

-- 
Dodji