On 08/04/2016 01:21 PM, David Malcolm wrote:

+
+static enum cpp_ttype
+get_cpp_ttype_from_string_type (tree string_type)
+{
+  gcc_assert (string_type);
+  if (TREE_CODE (string_type) != ARRAY_TYPE)
+    return CPP_OTHER;
+
+  tree element_type = TREE_TYPE (string_type);
+  if (TREE_CODE (element_type) != INTEGER_TYPE)
+    return CPP_OTHER;
+
+  int bits_per_character = TYPE_PRECISION (element_type);
+  switch (bits_per_character)
+    {
+    case 8:
+      return CPP_STRING;  /* It could have also been
CPP_UTF8STRING.  */
+    case 16:
+      return CPP_STRING16;
+    case 32:
+      return CPP_STRING32;
+    }
+
+  if (bits_per_character == TYPE_PRECISION (wchar_type_node))
+    return CPP_WSTRING;
Doesn't the switch above effectively mean we don't use CPP_WSTRING?
 In
what cases do you expect it to be used?

I was attempting to provide an inverse of lex_string and
fix_string_type, going back from a tree type to a cpp_ttype.
And I'm guessing (without looking closely at those routines) that you may not be able to reliably map backwards because CPP_WSTRING and one of CCP_STRING{16,32} are indistinguishable at this point. At least I think they are indistinguishable.



The
purpose of the ttype is to determine the execution charset of a
STRING_CST to enforce the requirement in cpp_interpret_string_ranges
that there's a 1:1 correspondence between bytes in the source encoding
and bytes in the execution encoding.

c-lex.c: lex_string has:
        case CPP_WSTRING:
          value = build_string (TYPE_PRECISION (wchar_type_node)
                                / TYPE_PRECISION (char_type_node),
                                "\0\0\0");  /* widest supported wchar_t
                                               is 32 bits */

Given that, it looks like it's not possible for that conditional to
fire (unless we somehow have a 24-bit wchar_t???)
I think wchar_t has to be an integral type, so this could only happen if one of the standard integral types was 24 bits. I guess that is possible. I think the code as written would catch that case -- but then again, if we had such a target in GCC, we'd probably end up defining CPP_STRING24 and the WCHAR code wouldn't fire.



Should I just drop the CPP_WSTRING conditional?  (and update the
function comment, to capture the fact that the cpp_ttype is one with
the same execution encoding as the STRING_CST, not necessarily equal to
the exact cpp_ttype that was in use).
I'd probably put a default case with some kind of assert/checking failure so that if some of this nature ever happens we'll get a nice loud message that our assumptions were incorrect ;-)

And yes, I think your suggestion on the function comment is spot-on.

OK with those changes.


diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 8c80574..7b5da57 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1110,6 +1110,35 @@ extern time_t cb_get_source_date_epoch
(cpp_reader *pfile);
    __TIME__ can store.  */
 #define MAX_SOURCE_DATE_EPOCH HOST_WIDE_INT_C (253402300799)

+extern GTY(()) string_concat_db *g_string_concat_db;
Presumably this DB needs to persist through the entire compilation
unit
and the nodes inside reference GC'd objects, right?  Just want to
make
100% sure that we need to expose this to the GC system before ack
-ing.

It needs to persist for as long as we might make queries about
substring locations.  It doesn't reference GC'd objects.  However it
might be desirable to locate locations within string constants loaded
from PCH files, hence I put it in GTY memory.  Is this overkill?
Ugh.  OK.  I'd rather it not be in GC, but I see the motivation.

And presumably with Martin's code wanting to issue warnings from the middle-end, we have to keep the table around through the middle end.

jeff

Reply via email to