On 08/03/2016 09:45 AM, David Malcolm wrote:
Changes in v3:
- Avoid including cpplib.h from input.h
- Properly handle stringified macro arguments (with tests for this)
- Minor whitespace fixes
- Move selftest.h changes to a separate patch

Changes in v2:
- Tweaks to substring location selftests
- Many more selftests (EBCDIC, the various wide string types, etc)
- Clean up conditions in charset.c; require source == execution charset
  to have substring locations
- Make string_concat_db field private
- Return error messages rather than bool
- Fix source_range for charset.c:convert_escape
- Introduce class substring_loc
- Handle bad input locations more gracefully
- Ensure that we can read substring information for a token which
  starts in one linemap and ends in another (seen in
  gcc.dg/cpp/pr69985.c)

This version addresses Joseph's qn about stringification of macro
arguments (by failing gracefully on them), and the modularity
concerns noted by Manu.

Successfully bootstrapped&regrtested in conjunction with the rest of the
patch kit on x86_64-pc-linux-gnu.

v2 of the kit successfully passes a full config-list.mk and a successful 
selftest
run for stage 1 on powerpc-ibm-aix7.1.3.0 (gcc111), both in conjunction with the
rest of the patch kit; I plan to repeat those tests.

I believe I can self-approve the changes to input.c, input.h, libcpp,
and the testsuite; the remaining changes needing approval are those
to c-family and to gcc.c.
I think that's a fair assessment. You might consider pulling those out as a distinct hunk in the future -- if you haven't noticed, I often try to knock out the smaller patches first (without even looking to see how much might be bits the author can self-approve).



OK for trunk if it passes testing? (by itself)


gcc/c-family/ChangeLog:
        * c-common.c: Include "substring-locations.h".
        (get_cpp_ttype_from_string_type): New function.
        (g_string_concat_db): New global.
        (substring_loc::get_range): New method.
        * c-common.h (g_string_concat_db): New declaration.
        (class substring_loc): New class.
        * c-lex.c (lex_string): When concatenating strings, capture the
        locations of all tokens using a new obstack, and record the
        concatenation locations within g_string_concat_db.
        * c-opts.c (c_common_init_options): Construct g_string_concat_db
        on the ggc-heap.

gcc/ChangeLog:
        * gcc.c (cpp_options): Rename string to...
        (cpp_options_): ...this, to avoid clashing with struct in
        cpplib.h.
        (static_specs): Update initialize for above renaming
        * input.c (string_concat::string_concat): New constructor.
        (string_concat_db::string_concat_db): New constructor.
        (string_concat_db::record_string_concatenation): New method.
        (string_concat_db::get_string_concatenation): New method.
        (string_concat_db::get_key_loc): New method.
        (class auto_cpp_string_vec): New class.
        (get_substring_ranges_for_loc): New function.
        (get_source_range_for_substring): New function.
        (get_num_source_ranges_for_substring): New function.
        (class selftest::lexer_test_options): New class.
        (struct selftest::lexer_test): New struct.
        (class selftest::ebcdic_execution_charset): New class.
        (selftest::ebcdic_execution_charset::s_singleton): New variable.
        (selftest::lexer_test::lexer_test): New constructor.
        (selftest::lexer_test::~lexer_test): New destructor.
        (selftest::lexer_test::get_token): New method.
        (selftest::assert_char_at_range): New function.
        (ASSERT_CHAR_AT_RANGE): New macro.
        (selftest::assert_num_substring_ranges): New function.
        (ASSERT_NUM_SUBSTRING_RANGES): New macro.
        (selftest::assert_has_no_substring_ranges): New function.
        (ASSERT_HAS_NO_SUBSTRING_RANGES): New macro.
        (selftest::test_lexer_string_locations_simple): New function.
        (selftest::test_lexer_string_locations_ebcdic): New function.
        (selftest::test_lexer_string_locations_hex): New function.
        (selftest::test_lexer_string_locations_oct): New function.
        (selftest::test_lexer_string_locations_letter_escape_1): New function.
        (selftest::test_lexer_string_locations_letter_escape_2): New function.
        (selftest::test_lexer_string_locations_ucn4): New function.
        (selftest::test_lexer_string_locations_ucn8): New function.
        (selftest::uint32_from_big_endian): New function.
        (selftest::test_lexer_string_locations_wide_string): New function.
        (selftest::uint16_from_big_endian): New function.
        (selftest::test_lexer_string_locations_string16): New function.
        (selftest::test_lexer_string_locations_string32): New function.
        (selftest::test_lexer_string_locations_u8): New function.
        (selftest::test_lexer_string_locations_utf8_source): New function.
        (selftest::test_lexer_string_locations_concatenation_1): New
        function.
        (selftest::test_lexer_string_locations_concatenation_2): New
        function.
        (selftest::test_lexer_string_locations_concatenation_3): New
        function.
        (selftest::test_lexer_string_locations_macro): New function.
        (selftest::test_lexer_string_locations_stringified_macro_argument):
        New function.
        (selftest::test_lexer_string_locations_non_string): New function.
        (selftest::test_lexer_string_locations_long_line): New function.
        (selftest::test_lexer_char_constants): New function.
        (selftest::input_c_tests): Call the new test functions once per
        case within the line_table test matrix.
        * input.h (struct string_concat): New struct.
        (struct location_hash): New struct.
        (class string_concat_db): New class.
        * substring-locations.h: New header.

gcc/testsuite/ChangeLog:
        * gcc.dg/plugin/diagnostic-test-string-literals-1.c: New file.
        * gcc.dg/plugin/diagnostic-test-string-literals-2.c: New file.
        * gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: New file.
        * gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above new files.

libcpp/ChangeLog:
        * charset.c (cpp_substring_ranges::cpp_substring_ranges): New
        constructor.
        (cpp_substring_ranges::~cpp_substring_ranges): New destructor.
        (cpp_substring_ranges::add_range): New method.
        (cpp_substring_ranges::add_n_ranges): New method.
        (_cpp_valid_ucn): Add "char_range" and "loc_reader" params; if
        they are non-NULL, read position information from *loc_reader
        and update char_range->m_finish accordingly.
        (convert_ucn): Add "char_range", "loc_reader", and "ranges"
        params.  If loc_reader is non-NULL, read location information from
        it, and update *ranges accordingly, using char_range.
        Conditionalize the conversion into tbuf on tbuf being non-NULL.
        (convert_hex): Likewise, conditionalizing the call to
        emit_numeric_escape on tbuf.
        (convert_oct): Likewise.
        (convert_escape): Add params "loc_reader" and "ranges".  If
        loc_reader is non-NULL, read location information from it, and
        update *ranges accordingly.  Conditionalize the conversion into
        tbuf on tbuf being non-NULL.
        (cpp_interpret_string): Rename to...
        (cpp_interpret_string_1): ...this, adding params "loc_readers" and
        "out".  Use "to" to conditionalize the initialization and usage of
        "tbuf", such as running the converter.  If "loc_readers" is
        non-NULL, use the instances within it, reading location
        information from them, and passing them to convert_escape; likewise
        write to "out" if loc_readers is non-NULL.  Check for leading
        quote and issue an error if it is not present.  Update boundary
        check from "== limit" to ">= limit" to protect against erroneous
        location values to calls that are not parsing string literals.
        (cpp_interpret_string): Reimplement in terms to
        cpp_interpret_string_1.
        (noop_error_cb): New function.
        (cpp_interpret_string_ranges): New function.
        (cpp_string_location_reader::cpp_string_location_reader): New
        constructor.
        (cpp_string_location_reader::get_next): New method.
        * include/cpplib.h (class cpp_string_location_reader): New class.
        (class cpp_substring_ranges): New class.
        (cpp_interpret_string_ranges): New prototype.
        * internal.h (_cpp_valid_ucn): Add params "char_range" and
        "loc_reader".
        * lex.c (forms_identifier_p): Pass NULL for new params to
        _cpp_valid_ucn.
---

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 27031b5..7a8b6ea 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -1098,6 +1099,67 @@ fix_string_type (tree value)
   TREE_STATIC (value) = 1;
   return value;
 }
+
+/* Given a string of type STRING_TYPE, determine what kind of string
+   token created it: CPP_STRING, CPP_STRING16, CPP_STRING32, or
+   CPP_WSTRING.  Return CPP_OTHER in case of error.
+
+   This effectively reverses part of the logic in
+   lex_string and fix_string_type.  */
+
+static enum cpp_ttype
+get_cpp_ttype_from_string_type (tree string_type)
+{
+  gcc_assert (string_type);
+  if (TREE_CODE (string_type) != ARRAY_TYPE)
+    return CPP_OTHER;
+
+  tree element_type = TREE_TYPE (string_type);
+  if (TREE_CODE (element_type) != INTEGER_TYPE)
+    return CPP_OTHER;
+
+  int bits_per_character = TYPE_PRECISION (element_type);
+  switch (bits_per_character)
+    {
+    case 8:
+      return CPP_STRING;  /* It could have also been CPP_UTF8STRING.  */
+    case 16:
+      return CPP_STRING16;
+    case 32:
+      return CPP_STRING32;
+    }
+
+  if (bits_per_character == TYPE_PRECISION (wchar_type_node))
+    return CPP_WSTRING;
Doesn't the switch above effectively mean we don't use CPP_WSTRING? In what cases do you expect it to be used?

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 8c80574..7b5da57 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1110,6 +1110,35 @@ extern time_t cb_get_source_date_epoch (cpp_reader 
*pfile);
    __TIME__ can store.  */
 #define MAX_SOURCE_DATE_EPOCH HOST_WIDE_INT_C (253402300799)

+extern GTY(()) string_concat_db *g_string_concat_db;
Presumably this DB needs to persist through the entire compilation unit and the nodes inside reference GC'd objects, right? Just want to make 100% sure that we need to expose this to the GC system before ack-ing.

The rest looks reasonable. So we just need to reach closure on those two issues IMHO.

jeff

Reply via email to