On 8/20/21 12:56 PM, Anthony Sharp via Gcc-patches wrote:
Hi, hope everyone is well. I have a patch here for issue 70417
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70417). I'm still a GCC
noob, and this is probably the hardest thing I have ever coded in my
life, so please forgive any initial mistakes!

TLDR

This patch introduces a helpful error message when the user attempts
to put a template-id after a member access token (., -> or ::) in a
dependent context without having put the "template" keyword after the
access token.

Great, thanks!

CONTEXT

In C++, when referencing a class member (using ., -> or ::) in a
dependent context, if that member is a template, the template keyword
is required after the access token. For example:

template<class T> void MyDependentTemplate(T t)
{
   t.DoSomething<T>(); /* Error - t is dependent. "<" is treated as a
less-than sign because DoSomething is assumed to be a non-template
identifier. */
   t.template DoSomething<T>(); // Good.
}

PATCH EXPLANATION

In order to throw an appropriate error message we need to identify
and/or create a point in the compiler where the following conditions
are all simultaneously satisfied:

A) a class member access token (., ->, ::)
B) a dependent context
C) the thing being accessed is a template-id
D) No template keyword

A, B and D are relatively straightforward and I won't go into detail
about how that was achieved. The real challenge is C - figuring out
whether a name is a template-id.

Usually, we would look up the name and use that to determine whether
the name is a template or not. But, we cannot do that. We are in a
dependent context, so lookup cannot (in theory) find anything at the
point the access expression is parsed.

We (maybe) could defer checking until the template is actually
instantiated. But this is not the approach I have gone for, since this
to me sounded unnecessarily awkward and clunky. In my mind this also
seems like a syntax error, and IMO it seems more natural that syntax
errors should get caught as soon as they are parsed.

So, the solution I went for was to figure out whether the name was a
template by looking at the surrounding tokens. The key to this lies in
being able to differentiate between the start and end of a template
parameter list (< and >) and greater-than and less-than tokens (and
perhaps also things like << or >>). If the final > (if we find one at
all) does indeed mark the end of a class or function template, then it
will be followed by something like (, ::, or even just ;. On the other
hand, a greater-than operator would be followed by a name.

PERFORMANCE IMPACT

My concern with this approach was that checking this would actually
slow the compiler down. In the end it seemed the opposite was true. By
parsing the tokens to determine whether the name looks like a
template-id, we can cut out a lot of the template-checking code that
gets run trying to find a template when there is none, making compile
time generally faster. That being said, I'm not sure if the
improvement will be that substantial, so I did not feel it necessary
to introduce the template checking method everywhere where we could
possibly have run into a template.

I ran an ABAB test with the following code repeated enough times to
fill up about 10,000 lines:

ai = 1;
bi = 2;
ci = 3;
c.x = 4;
(&c)->x = 5;
mytemplateclass<Y>::y = 6;
c.template class_func<Y>();
(&c)->template class_func<Y>();
mytemplateclass<Y>::template class_func_static<Y>();
c2.class_func<myclass>();
(&c2)->class_func<myclass>();
myclass::class_func_static<myclass>();
ai = 6;
bi = 5;
ci = 4;
c.x = 3;
(&c)->x = 2;
mytemplateclass<Y>::y = 1;
c.template class_func<Y>();
(&c)->template class_func<Y>();
mytemplateclass<Y>::template class_func_static<Y>();
c2.class_func<myclass>();
(&c2)->class_func<myclass>();
myclass::class_func_static<myclass>();

It basically just contains a mixture of class access expressions plus
some regular assignments for good measure. The compiler was compiled
without any optimisations (and so slower than usual). In hindsight,
perhaps this test case assumes the worst-ish-case scenario since it
contains a lot of templates; most of the benefits of this patch occur
when parsing non-templates.

The compiler (clean) and the compiler with the patch applied (patched)
compiled the above code 20 times each in ABAB fashion. On average, the
clean compiler took 2.26295 seconds and the patched compiler took
2.25145 seconds (about 0.508% faster). Whether this improvement
remains or disappears when the compiler is built with optimisations
turned on I haven't tested. My main concern was just making sure it
didn't get any slower.

AWKWARD TEST CASES

You will see from the patch that it required the updating of a few
testcases (as well as one place within the compiler). I believe this
is because up until now, the compiler allowed the omission of the
template keyword in some places it shouldn't have. Take decltype34.C
for example:

struct impl
{
   template <class T> static T create();
};

template<class T, class U, class =
decltype(impl::create<T>()->impl::create<U>())>
struct tester{};

GCC currently accepts this code. My patch causes it to be rejected.

For what it's worth, Clang also rejects this code:

1824786093/source.cpp:6:70: error: use 'template' keyword to treat
'create' as a dependent template name
template<class T, class U, class =
decltype(impl::create<T>()->impl::create<U>())>

                                       ^ template

I think Clang is correct since from what I understand it should be
written as impl::create<T>()->impl::template create<U>(). Here,
create<T>() is dependent, so it makes sense that we would need
"template" before the scope operator. Then again, I could well be
wrong. The rules around dependent template names are pretty crazy.

This is basically core issue 1835, http://wg21.link/cwg1835

This was changed for C++23 by the paper "Declarations and where to find them", http://wg21.link/p1787

Previously, e.g. in C++20, https://timsong-cpp.github.io/cppwp/n4861/basic.lookup.classref#4 said that when we see ->impl, since we can't look up the name in the (dependent) object type, we do unqualified lookup and find ::impl, so create<U> is not in a dependent scope, and the testcase is fine.

The 1835 change clarifies that we only do unqualified lookup of the second impl if the object type is non-dependent, so the second create is in a dependent scope, and we need the ::template.

But in either case, whether create<U> is in a dependent scope depends on how we resolve impl::, we don't need to remember further back in the expression. So your dependent_expression_p parameter seems like the wrong approach. Note that when we're looking up the name after ->, the type of the object expression is in parser->context->object_type.

1835 also makes the following testcase valid, which we don't currently accept, but clang does:

template <class T> void f(T t) { t.foo::bar(); }
struct foo { void bar(); };
int main() { f(foo()); }

For C++20 and down, I want to start accepting this testcase, but also still accept decltype34.C to avoid breaking existing code. But that's a separate issue; I don't think your patch should affect decltype34.

The cases you fixed in symbol-summary.h are indeed mistakes, but not ill-formed, so giving an error on them is wrong. For example, here is a well-formed program that is rejected with your patch:

template <class T, int N> void f(T t) { t.m<N>(0); }
struct A { int m; } a;
int main() { f<A,42>(a); }

Comparing the result of < by > is very unlikely to be what the user actually meant, but it is potentially valid. So we should accept f with a warning about the dubious expression. People can silence the warning if they really mean this by parenthesizing the LHS of the < operator.

Another valid program, using ::

int x;
template <class T, int N> void f(T t) { t.m<N>::x; }
struct A { int m; } a;
int main() { f<A,42>(a); }

Now to reviewing the actual patch:

+   yet).  Return 1 if it does look like a template-id.  Return 2
+   if not.  */

Let's use -1 for definitely not.

+  /* Could be a ~ referencing the destructor of a class template.
+     Temporarily eat it up so it's not in our way.  */
+  if (next_token->type == CPP_COMPL)
+    {
+      cp_lexer_save_tokens (parser->lexer);
+      cp_lexer_consume_token (parser->lexer);
+      next_token = cp_lexer_peek_token (parser->lexer);
+      saved = true;
+    }

There's no ambiguity in the case of a destructor, as you note in your comments in cp_parser_id_expression. How about returning early if we see ~?

+  /* Find offset of final ">".  */
+  for (; depth > 0; i++)
+    {
+      switch (cp_lexer_peek_nth_token (parser->lexer, i)->type)
+        {
+          case CPP_LESS:
+            depth++;
+            break;
+          case CPP_GREATER:
+            depth--;
+            break;
+          case CPP_RSHIFT:
+            depth-=2;
+            break;
+          case CPP_EOF:
+          case CPP_SEMICOLON:
+            goto no;
+          default:
+            break;
+        }
+    }

This code doesn't handle skipping matched ()/{}/[] in the template-argument-list. You probably want to involve cp_parser_skip_to_end_of_template_parameter_list somehow.

+no:
+  if (saved)
+    cp_lexer_rollback_tokens (parser->lexer);
+  return 2;
+
+yes:
+  if (saved)
+    cp_lexer_rollback_tokens (parser->lexer);
+  return 1;

Now that we're writing C++, I'd prefer to avoid this kind of pattern in favor of RAII, such as saved_token_sentinel. If this is still relevant after addressing the above comments.

+   If DEPENDENT_EXPRESSION_P is true, this is a dependent id-expression
+   that is not the current instantiation.  */

As mentioned above, let's not add this parameter.

+      error_at (next_token->location,
+               "expected \"template\" keyword before dependent template "
+               "name");

As mentioned above, this should be a warning.

-       /* If it worked, we're done.  */
-       if (cp_parser_parse_definitely (parser))
-         return id;
+  if (looks_like_template != 2)
+    {
+      /* We don't know yet whether or not this will be a
+        template-id.  */

The indentation here is off.

+                         int looks_like_template)

Let's give these parms a default argument of 0. And call them looks_like_template_id to be clearer.

-  /* Avoid performing name lookup if there is no possibility of
-     finding a template-id.  */
-  if ((token->type != CPP_NAME && token->keyword != RID_OPERATOR)
-      || (token->type == CPP_NAME
-         && !cp_parser_nth_token_starts_template_argument_list_p
-              (parser, 2)))
+  /* Only if we have not already checked whether this looks like a
+     template.  */
+  if (looks_like_template == 0)
     {
-      cp_parser_error (parser, "expected template-id");
-      return error_mark_node;
+      /* Avoid performing name lookup if there is no possibility of
+        finding a template-id.  */

You could add the looks_like_template_id check to the existing condition so we don't need to reindent the comment or body.

+++ b/gcc/symbol-summary.h
@@ -191,7 +191,7 @@ public:
   template<typename Arg, bool (*f)(const T &, Arg)>
   void traverse (Arg a) const
   {
-    m_map.traverse <f> (a);
+    m_map.template traverse <f> (a);

I've gone ahead and applied this fix as a separate patch.

Jason

Reply via email to