GCC diagnostic messages can contain URLs, such as to our documentation
when we suggest an option name to correct a misspelling.

SARIF message strings can contain embedded URLs in the plain text
messages (see SARIF v2.1.0 §3.11.6), but previously we were
simply dropping any URLs from the diagnostic messages.

This patch adds support for encoding URLs into messages in our SARIF
output, using the pp_token machinery added in the previous patch.

As well as supporting URLs, the patch also adjusts how we report
event IDs in SARIF message, so that rather than e.g.
  "text": "second 'free' here; first 'free' was at (1)"
we now report:
  "text": "second 'free' here; first 'free' was at 
[(1)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/0)"

i.e. the text "(1)" now has a embedded link referring within the sarif
log to the threadFlowLocation object for the other event, via JSON
pointer (see §3.10.3 "URIs that use the sarif scheme").  Doing so
requires the arious objects to know their index within their containing
array, requiring some reworking of how they are constructed.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3312-gaff7f677120ec3.

gcc/ChangeLog:
        PR other/116419
        * diagnostic-event-id.h (diagnostic_event_id_t::zero_based): New.
        * diagnostic-format-sarif.cc: Include "pretty-print-format-impl.h"
        and "pretty-print-urlifier.h".
        (sarif_result::sarif_result): Add param "idx_within_parent".
        (sarif_result::get_index_within_parent): New accessor.
        (sarif_result::m_idx_within_parent): New field.
        (sarif_code_flow::sarif_code_flow): New ctor.
        (sarif_code_flow::get_parent): New accessor.
        (sarif_code_flow::get_index_within_parent): New accessor.
        (sarif_code_flow::m_parent): New field.
        (sarif_code_flow::m_thread_id_map): New field.
        (sarif_code_flow::m_thread_flows_arr): New field.
        (sarif_code_flow::m_all_tfl_objs): New field.
        (sarif_thread_flow::sarif_thread_flow): Add "parent" and
        "idx_within_parent" params.
        (sarif_thread_flow::get_parent): New accessor.
        (sarif_thread_flow::get_index_within_parent): New accessor.
        (sarif_thread_flow::m_parent): New field.
        (sarif_thread_flow::m_idx_within_parent): New field.
        (sarif_thread_flow_location::sarif_thread_flow_location): New
        ctor.
        (sarif_thread_flow_location::get_parent): New accessor.
        (sarif_thread_flow_location::get_index_within_parent): New
        accessor.
        (sarif_thread_flow_location::m_parent): New field.
        (sarif_thread_flow_location::m_idx_within_parent): New field.
        (sarif_builder::get_code_flow_for_event_ids): New accessor.
        (class sarif_builder::sarif_token_printer): New.
        (sarif_builder::m_token_printer): New member.
        (sarif_builder::m_next_result_idx): New field.
        (sarif_builder::m_current_code_flow): New field.
        (sarif_code_flow::get_or_append_thread_flow): New.
        (sarif_code_flow::get_thread_flow): New.
        (sarif_code_flow::add_location): New.
        (sarif_code_flow::get_thread_flow_loc_obj): New.
        (sarif_thread_flow::add_location): Create the new
        sarif_thread_flow_location internally, rather than passing
        it in as a parm so that we can keep track of its index in
        the array.  Return a reference to it.
        (sarif_builder::sarif_builder): Initialize m_token_printer,
        m_next_result_idx, and m_current_code_flow.
        (sarif_builder::on_report_diagnostic): Pass index to
        make_result_object.
        (sarif_builder::make_result_object): Add "idx_within_parent" param
        and pass to sarif_result ctor.  Pass code flow index to call to
        make_code_flow_object.
        (make_sarif_url_for_event): New.
        (sarif_builder::make_code_flow_object): Add "idx_within_parent"
        param and pass it to sarif_code_flow ctor.  Reimplement walking
        of events so that we first create threadFlow objects for each
        thread, then populate them with threadFlowLocation objects, so
        that the IDs work.  Set m_current_code_flow whilst creating the
        latter, so that we can create correct URIs for "%@".
        (sarif_builder::make_thread_flow_location_object): Replace with...
        (sarif_builder::populate_thread_flow_location_object): ...this.
        (sarif_output_format::get_builder): New accessor.
        (sarif_begin_embedded_link): New.
        (sarif_end_embedded_link): New.
        (sarif_builder::sarif_token_printer::print_tokens): New.
        (diagnostic_output_format_init_sarif): Add "fmt" param; use it to
        set the token printer and output format for the context.
        (diagnostic_output_format_init_sarif_stderr): Move responsibility
        for setting the context's output format to within
        diagnostic_output_format_init_sarif.
        (diagnostic_output_format_init_sarif_file): Likewise.
        (diagnostic_output_format_init_sarif_stream): Likewise.
        (test_sarif_diagnostic_context::test_sarif_diagnostic_context):
        Likewise.
        (selftest::test_make_location_object): Provide an idx for the
        result.
        (selftest::get_result_from_log): New.
        (selftest::get_message_from_log): New.
        (selftest::test_message_with_embedded_link): New test.
        (selftest::diagnostic_format_sarif_cc_tests): Call it.
        * pretty-print-format-impl.h: Include "diagnostic-event-id.h".
        (pp_token::kind): Add "event_id".
        (struct pp_token_event_id): New.
        (is_a_helper <pp_token_event_id *>::test): New.
        (is_a_helper <const pp_token_event_id *>::test): New.
        * pretty-print.cc (pp_token::dump): Handle kind::event_id.
        (pretty_printer::format): Update handling of "%@" in phase 2
        so that we add a pp_token_event_id, rather that the text "(N)".
        (default_token_printer): Handle pp_token::kind::event_id by
        printing the text "(N)".

gcc/testsuite/ChangeLog:
        PR other/116419
        * gcc.dg/sarif-output/bad-pragma.c: New test.
        * gcc.dg/sarif-output/test-bad-pragma.py: New test.
        * gcc.dg/sarif-output/test-include-chain-2.py
        (test_location_relationships): Update expected text of event to
        include an intra-sarif URI to the other event.

Signed-off-by: David Malcolm <dmalc...@redhat.com>
---
 gcc/diagnostic-event-id.h                     |   6 +
 gcc/diagnostic-format-sarif.cc                | 605 +++++++++++++++---
 gcc/pretty-print-format-impl.h                |  32 +
 gcc/pretty-print.cc                           |  29 +-
 .../gcc.dg/sarif-output/bad-pragma.c          |  16 +
 .../gcc.dg/sarif-output/test-bad-pragma.py    |  38 ++
 .../sarif-output/test-include-chain-2.py      |   6 +-
 7 files changed, 636 insertions(+), 96 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/sarif-output/bad-pragma.c
 create mode 100644 gcc/testsuite/gcc.dg/sarif-output/test-bad-pragma.py

diff --git a/gcc/diagnostic-event-id.h b/gcc/diagnostic-event-id.h
index 78c2ccbbc99d..8237ba34df33 100644
--- a/gcc/diagnostic-event-id.h
+++ b/gcc/diagnostic-event-id.h
@@ -41,6 +41,12 @@ class diagnostic_event_id_t
 
   bool known_p () const { return m_index != UNKNOWN_EVENT_IDX; }
 
+  int zero_based () const
+  {
+    gcc_assert (known_p ());
+    return m_index;
+  }
+
   int one_based () const
   {
     gcc_assert (known_p ());
diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 59d9cd721839..9d9e7ae60734 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "selftest-diagnostic-show-locus.h"
 #include "selftest-json.h"
 #include "text-range-label.h"
+#include "pretty-print-format-impl.h"
+#include "pretty-print-urlifier.h"
 
 /* Forward decls.  */
 class sarif_builder;
@@ -385,7 +387,12 @@ private:
 class sarif_result : public sarif_location_manager
 {
 public:
-  sarif_result () : m_related_locations_arr (nullptr) {}
+  sarif_result (unsigned idx_within_parent)
+    : m_related_locations_arr (nullptr),
+      m_idx_within_parent (idx_within_parent)
+  {}
+
+  unsigned get_index_within_parent () const { return m_idx_within_parent; }
 
   void
   on_nested_diagnostic (diagnostic_context &context,
@@ -402,6 +409,7 @@ public:
 
 private:
   json::array *m_related_locations_arr; // borrowed
+  const unsigned m_idx_within_parent;
 };
 
 /* Subclass of sarif_object for SARIF "location" objects
@@ -460,7 +468,39 @@ private:
 /* Subclass of sarif_object for SARIF "codeFlow" objects
    (SARIF v2.1.0 section 3.36).  */
 
-class sarif_code_flow : public sarif_object {};
+class sarif_code_flow : public sarif_object
+{
+public:
+  sarif_code_flow (sarif_result &parent,
+                  unsigned idx_within_parent);
+
+  sarif_result &get_parent () const { return m_parent; }
+  unsigned get_index_within_parent () const { return m_idx_within_parent; }
+
+  sarif_thread_flow &
+  get_or_append_thread_flow (const diagnostic_thread &thread,
+                            diagnostic_thread_id_t thread_id);
+
+  sarif_thread_flow &
+  get_thread_flow (diagnostic_thread_id_t thread_id);
+
+  void add_location (sarif_thread_flow_location &);
+
+  sarif_thread_flow_location &
+  get_thread_flow_loc_obj (diagnostic_event_id_t event_id) const;
+
+private:
+  sarif_result &m_parent;
+  const unsigned m_idx_within_parent;
+
+  hash_map<int_hash<diagnostic_thread_id_t, -1, -2>,
+          sarif_thread_flow *> m_thread_id_map; // borrowed ptr
+  json::array *m_thread_flows_arr; // borrowed
+
+  /* Vec of borrowed ptr, allowing for going easily from
+     an event_id to the corresponding threadFlowLocation object.  */
+  std::vector<sarif_thread_flow_location *> m_all_tfl_objs;
+};
 
 /* Subclass of sarif_object for SARIF "threadFlow" objects
    (SARIF v2.1.0 section 3.37).  */
@@ -468,20 +508,41 @@ class sarif_code_flow : public sarif_object {};
 class sarif_thread_flow : public sarif_object
 {
 public:
-  sarif_thread_flow (const diagnostic_thread &thread);
+  sarif_thread_flow (sarif_code_flow &parent,
+                    const diagnostic_thread &thread,
+                    unsigned idx_within_parent);
 
-  void
-  add_location
-    (std::unique_ptr<sarif_thread_flow_location> thread_flow_loc_obj);
+  sarif_code_flow &get_parent () const { return m_parent; }
+  unsigned get_index_within_parent () const { return m_idx_within_parent; }
+
+  sarif_thread_flow_location &add_location ();
 
 private:
+  sarif_code_flow &m_parent;
   json::array *m_locations_arr; // borrowed
+  const unsigned m_idx_within_parent;
 };
 
 /* Subclass of sarif_object for SARIF "threadFlowLocation" objects
    (SARIF v2.1.0 section 3.38).  */
 
-class sarif_thread_flow_location : public sarif_object {};
+class sarif_thread_flow_location : public sarif_object
+{
+public:
+  sarif_thread_flow_location (sarif_thread_flow &parent,
+                             unsigned idx_within_parent)
+  : m_parent (parent),
+    m_idx_within_parent (idx_within_parent)
+  {
+  }
+
+  sarif_thread_flow &get_parent () const { return m_parent; }
+  unsigned get_index_within_parent () const { return m_idx_within_parent; }
+
+private:
+  sarif_thread_flow &m_parent;
+  const unsigned m_idx_within_parent;
+};
 
 /* Subclass of sarif_object for SARIF "reportingDescriptor" objects
    (SARIF v2.1.0 section 3.49).  */
@@ -632,11 +693,33 @@ public:
   std::unique_ptr<sarif_artifact_location>
   make_artifact_location_object (const char *filename);
 
+  const sarif_code_flow *
+  get_code_flow_for_event_ids () const
+  {
+    return m_current_code_flow;
+  }
+
+  token_printer &get_token_printer () { return m_token_printer; }
+
 private:
+  class sarif_token_printer : public token_printer
+  {
+  public:
+    sarif_token_printer (sarif_builder &builder)
+      : m_builder (builder)
+    {
+    }
+    void print_tokens (pretty_printer *pp,
+                      const pp_token_list &tokens) final override;
+  private:
+    sarif_builder &m_builder;
+  };
+
   std::unique_ptr<sarif_result>
   make_result_object (diagnostic_context &context,
                      const diagnostic_info &diagnostic,
-                     diagnostic_t orig_diag_kind);
+                     diagnostic_t orig_diag_kind,
+                     unsigned idx_within_parent);
   void
   add_any_include_chain (sarif_location_manager &loc_mgr,
                         sarif_location &location_obj,
@@ -650,11 +733,13 @@ private:
                        enum diagnostic_artifact_role role);
   std::unique_ptr<sarif_code_flow>
   make_code_flow_object (sarif_result &result,
+                        unsigned idx_within_parent,
                         const diagnostic_path &path);
-  std::unique_ptr<sarif_thread_flow_location>
-  make_thread_flow_location_object (sarif_result &result,
-                                   const diagnostic_event &event,
-                                   int path_event_idx);
+  void
+  populate_thread_flow_location_object (sarif_result &result,
+                                       sarif_thread_flow_location 
&thread_flow_loc_obj,
+                                       const diagnostic_event &event,
+                                       int event_execution_idx);
   std::unique_ptr<json::array>
   maybe_make_kinds_array (diagnostic_event::meaning m) const;
   std::unique_ptr<sarif_physical_location>
@@ -725,6 +810,7 @@ private:
 
   diagnostic_context &m_context;
   const line_maps *m_line_maps;
+  sarif_token_printer m_token_printer;
 
   /* The JSON object for the invocation object.  */
   std::unique_ptr<sarif_invocation> m_invocation_obj;
@@ -751,6 +837,9 @@ private:
   int m_tabstop;
 
   bool m_formatted;
+
+  unsigned m_next_result_idx;
+  sarif_code_flow *m_current_code_flow;
 };
 
 /* class sarif_object : public json::object.  */
@@ -1294,9 +1383,67 @@ lazily_add_kind (enum location_relationship_kind kind)
   kinds_arr->append_string (kind_str);
 }
 
+/* class sarif_code_flow : public sarif_object.  */
+
+sarif_code_flow::sarif_code_flow (sarif_result &parent,
+                                 unsigned idx_within_parent)
+: m_parent (parent),
+  m_idx_within_parent (idx_within_parent)
+{
+  /* "threadFlows" property (SARIF v2.1.0 section 3.36.3).  */
+  auto thread_flows_arr = ::make_unique<json::array> ();
+  m_thread_flows_arr = thread_flows_arr.get (); // borrowed
+  set<json::array> ("threadFlows", std::move (thread_flows_arr));
+}
+
+sarif_thread_flow &
+sarif_code_flow::get_or_append_thread_flow (const diagnostic_thread &thread,
+                                           diagnostic_thread_id_t thread_id)
+{
+  sarif_thread_flow **slot = m_thread_id_map.get (thread_id);
+  if (slot)
+    return **slot;
+
+  unsigned next_thread_flow_idx = m_thread_flows_arr->size ();
+  auto thread_flow_obj
+    = ::make_unique<sarif_thread_flow> (*this, thread, next_thread_flow_idx);
+  m_thread_id_map.put (thread_id, thread_flow_obj.get ()); // borrowed
+  sarif_thread_flow *result = thread_flow_obj.get ();
+  m_thread_flows_arr->append<sarif_thread_flow> (std::move (thread_flow_obj));
+  return *result;
+}
+
+sarif_thread_flow &
+sarif_code_flow::get_thread_flow (diagnostic_thread_id_t thread_id)
+{
+  sarif_thread_flow **slot = m_thread_id_map.get (thread_id);
+  gcc_assert (slot); // it must already have one
+  return **slot;
+}
+
+void
+sarif_code_flow::add_location (sarif_thread_flow_location &tfl_obj)
+{
+  m_all_tfl_objs.push_back (&tfl_obj);
+}
+
+sarif_thread_flow_location &
+sarif_code_flow::get_thread_flow_loc_obj (diagnostic_event_id_t event_id) const
+{
+  gcc_assert (event_id.known_p ());
+  gcc_assert ((size_t)event_id.zero_based () < m_all_tfl_objs.size ());
+  sarif_thread_flow_location *tfl_obj = m_all_tfl_objs[event_id.zero_based ()];
+  gcc_assert (tfl_obj);
+  return *tfl_obj;
+}
+
 /* class sarif_thread_flow : public sarif_object.  */
 
-sarif_thread_flow::sarif_thread_flow (const diagnostic_thread &thread)
+sarif_thread_flow::sarif_thread_flow (sarif_code_flow &parent,
+                                     const diagnostic_thread &thread,
+                                     unsigned idx_within_parent)
+: m_parent (parent),
+  m_idx_within_parent (idx_within_parent)
 {
   /* "id" property (SARIF v2.1.0 section 3.37.2).  */
   label_text name (thread.get_name (false));
@@ -1310,11 +1457,18 @@ sarif_thread_flow::sarif_thread_flow (const 
diagnostic_thread &thread)
   set ("locations", m_locations_arr);
 }
 
-void
-sarif_thread_flow::
-add_location (std::unique_ptr<sarif_thread_flow_location> thread_flow_loc_obj)
+/* Add a sarif_thread_flow_location to this threadFlow object, but
+   don't populate it yet.  */
+
+sarif_thread_flow_location &
+sarif_thread_flow::add_location ()
 {
-  m_locations_arr->append (std::move (thread_flow_loc_obj));
+  const unsigned thread_flow_location_idx = m_locations_arr->size ();
+  sarif_thread_flow_location *thread_flow_loc_obj
+    = new sarif_thread_flow_location (*this, thread_flow_location_idx);
+  m_locations_arr->append (thread_flow_loc_obj);
+  m_parent.add_location (*thread_flow_loc_obj);
+  return *thread_flow_loc_obj;
 }
 
 /* class sarif_builder.  */
@@ -1327,6 +1481,7 @@ sarif_builder::sarif_builder (diagnostic_context &context,
                              bool formatted)
 : m_context (context),
   m_line_maps (line_maps),
+  m_token_printer (*this),
   m_invocation_obj
     (::make_unique<sarif_invocation> (*this,
                                      context.get_original_argv ())),
@@ -1336,7 +1491,9 @@ sarif_builder::sarif_builder (diagnostic_context &context,
   m_rule_id_set (),
   m_rules_arr (new json::array ()),
   m_tabstop (context.m_tabstop),
-  m_formatted (formatted)
+  m_formatted (formatted),
+  m_next_result_idx (0),
+  m_current_code_flow (nullptr)
 {
   gcc_assert (m_line_maps);
 
@@ -1376,7 +1533,8 @@ sarif_builder::on_report_diagnostic (diagnostic_context 
&context,
     {
       /* Top-level diagnostic.  */
       m_cur_group_result
-       = make_result_object (context, diagnostic, orig_diag_kind);
+       = make_result_object (context, diagnostic, orig_diag_kind,
+                             m_next_result_idx++);
     }
 }
 
@@ -1476,9 +1634,10 @@ make_rule_id_for_diagnostic_kind (diagnostic_t diag_kind)
 std::unique_ptr<sarif_result>
 sarif_builder::make_result_object (diagnostic_context &context,
                                   const diagnostic_info &diagnostic,
-                                  diagnostic_t orig_diag_kind)
+                                  diagnostic_t orig_diag_kind,
+                                  unsigned idx_within_parent)
 {
-  auto result_obj = ::make_unique<sarif_result> ();
+  auto result_obj = ::make_unique<sarif_result> (idx_within_parent);
 
   /* "ruleId" property (SARIF v2.1.0 section 3.27.5).  */
   /* Ideally we'd have an option_name for these.  */
@@ -1552,8 +1711,10 @@ sarif_builder::make_result_object (diagnostic_context 
&context,
   if (const diagnostic_path *path = diagnostic.richloc->get_path ())
     {
       auto code_flows_arr = ::make_unique<json::array> ();
+      const unsigned code_flow_index = 0;
       code_flows_arr->append<sarif_code_flow>
        (make_code_flow_object (*result_obj.get (),
+                               code_flow_index,
                                *path));
       result_obj->set<json::array> ("codeFlows", std::move (code_flows_arr));
     }
@@ -2272,83 +2433,115 @@ make_sarif_logical_location_object (const 
logical_location &logical_loc)
   return logical_loc_obj;
 }
 
+label_text
+make_sarif_url_for_event (const sarif_code_flow *code_flow,
+                         diagnostic_event_id_t event_id)
+{
+  gcc_assert (event_id.known_p ());
+
+  if (!code_flow)
+    return label_text ();
+
+  const sarif_thread_flow_location &tfl_obj
+    = code_flow->get_thread_flow_loc_obj (event_id);
+  const int location_idx = tfl_obj.get_index_within_parent ();
+
+  const sarif_thread_flow &thread_flow_obj = tfl_obj.get_parent ();
+  const int thread_flow_idx = thread_flow_obj.get_index_within_parent ();
+
+  const sarif_code_flow &code_flow_obj = thread_flow_obj.get_parent ();
+  const int code_flow_idx = code_flow_obj.get_index_within_parent ();
+
+  const sarif_result &result_obj = code_flow_obj.get_parent ();
+  const int result_idx = result_obj.get_index_within_parent ();
+
+  /* We only support a single run object in the log.  */
+  const int run_idx = 0;
+
+  char *buf = xasprintf
+    ("sarif:/runs/%i/results/%i/codeFlows/%i/threadFlows/%i/locations/%i",
+     run_idx, result_idx, code_flow_idx, thread_flow_idx, location_idx);
+  return label_text::take (buf);
+}
+
 /* Make a "codeFlow" object (SARIF v2.1.0 section 3.36) for PATH.  */
 
 std::unique_ptr<sarif_code_flow>
 sarif_builder::make_code_flow_object (sarif_result &result,
+                                     unsigned idx_within_parent,
                                      const diagnostic_path &path)
 {
-  auto code_flow_obj = ::make_unique <sarif_code_flow> ();
+  auto code_flow_obj
+    = ::make_unique <sarif_code_flow> (result, idx_within_parent);
 
-  /* "threadFlows" property (SARIF v2.1.0 section 3.36.3).  */
-  auto thread_flows_arr = ::make_unique<json::array> ();
-
-  /* Walk the events, consolidating into per-thread threadFlow objects,
-     using the index with PATH as the overall executionOrder.  */
-  hash_map<int_hash<diagnostic_thread_id_t, -1, -2>,
-          sarif_thread_flow *> thread_id_map; // borrowed
+  /* First pass:
+     Create threadFlows and threadFlowLocation objects within them,
+     effectively recording a mapping from event_id to threadFlowLocation
+     so that we can later go from an event_id to a URI within the
+     SARIF file.  */
   for (unsigned i = 0; i < path.num_events (); i++)
     {
       const diagnostic_event &event = path.get_event (i);
       const diagnostic_thread_id_t thread_id = event.get_thread_id ();
-      sarif_thread_flow *thread_flow_obj;
 
-      if (sarif_thread_flow **slot = thread_id_map.get (thread_id))
-       thread_flow_obj = *slot;
-      else
-       {
-         const diagnostic_thread &thread = path.get_thread (thread_id);
-         thread_flow_obj = new sarif_thread_flow (thread);
-         thread_id_map.put (thread_id, thread_flow_obj); // borrowed
-         thread_flows_arr->append (thread_flow_obj);
-       }
+      sarif_thread_flow &thread_flow_obj
+       = code_flow_obj->get_or_append_thread_flow (path.get_thread (thread_id),
+                                                   thread_id);
+      thread_flow_obj.add_location ();
+    }
 
-      /* Add event to thread's threadFlow object.  */
-      std::unique_ptr<sarif_thread_flow_location> thread_flow_loc_obj
-       = make_thread_flow_location_object (result, event, i);
-      thread_flow_obj->add_location (std::move (thread_flow_loc_obj));
+  /* Second pass: walk the events, populating the tfl objs.  */
+  m_current_code_flow = code_flow_obj.get ();
+  for (unsigned i = 0; i < path.num_events (); i++)
+    {
+      const diagnostic_event &event = path.get_event (i);
+      sarif_thread_flow_location &thread_flow_loc_obj
+       = code_flow_obj->get_thread_flow_loc_obj (i);
+      populate_thread_flow_location_object (result,
+                                           thread_flow_loc_obj,
+                                           event,
+                                           i);
     }
-  code_flow_obj->set<json::array> ("threadFlows", std::move 
(thread_flows_arr));
+  m_current_code_flow = nullptr;
 
   return code_flow_obj;
 }
 
-/* Make a "threadFlowLocation" object (SARIF v2.1.0 section 3.38) for EVENT.  
*/
+/* Populate TFL_OBJ, a "threadFlowLocation" object (SARIF v2.1.0 section 3.38)
+   based on EVENT.  */
 
-std::unique_ptr<sarif_thread_flow_location>
-sarif_builder::make_thread_flow_location_object (sarif_result &result,
-                                                const diagnostic_event &ev,
-                                                int path_event_idx)
+void
+sarif_builder::
+populate_thread_flow_location_object (sarif_result &result,
+                                     sarif_thread_flow_location &tfl_obj,
+                                     const diagnostic_event &ev,
+                                     int event_execution_idx)
 {
-  auto thread_flow_loc_obj = ::make_unique<sarif_thread_flow_location> ();
-
   /* Give diagnostic_event subclasses a chance to add custom properties
      via a property bag.  */
-  ev.maybe_add_sarif_properties (*thread_flow_loc_obj);
+  ev.maybe_add_sarif_properties (tfl_obj);
 
   /* "location" property (SARIF v2.1.0 section 3.38.3).  */
-  thread_flow_loc_obj->set<sarif_location>
+  tfl_obj.set<sarif_location>
     ("location",
      make_location_object (result, ev, diagnostic_artifact_role::traced_file));
 
   /* "kinds" property (SARIF v2.1.0 section 3.38.8).  */
   diagnostic_event::meaning m = ev.get_meaning ();
   if (auto kinds_arr = maybe_make_kinds_array (m))
-    thread_flow_loc_obj->set<json::array> ("kinds", std::move (kinds_arr));
+    tfl_obj.set<json::array> ("kinds", std::move (kinds_arr));
 
   /* "nestingLevel" property (SARIF v2.1.0 section 3.38.10).  */
-  thread_flow_loc_obj->set_integer ("nestingLevel", ev.get_stack_depth ());
+  tfl_obj.set_integer ("nestingLevel", ev.get_stack_depth ());
 
   /* "executionOrder" property (SARIF v2.1.0 3.38.11).
      Offset by 1 to match the human-readable values emitted by %@.  */
-  thread_flow_loc_obj->set_integer ("executionOrder", path_event_idx + 1);
+  tfl_obj.set_integer ("executionOrder", event_execution_idx + 1);
 
   /* It might be nice to eventually implement the following for -fanalyzer:
      - the "stack" property (SARIF v2.1.0 section 3.38.5)
      - the "state" property (SARIF v2.1.0 section 3.38.9)
      - the "importance" property (SARIF v2.1.0 section 3.38.13).  */
-
-  return thread_flow_loc_obj;
 }
 
 /* If M has any known meaning, make a json array suitable for the "kinds"
@@ -2933,6 +3126,8 @@ public:
     m_builder.emit_diagram (m_context, diagram);
   }
 
+  sarif_builder &get_builder () { return m_builder; }
+
 protected:
   sarif_output_format (diagnostic_context &context,
                       const line_maps *line_maps,
@@ -3008,11 +3203,128 @@ private:
   char *m_base_file_name;
 };
 
+/* Print the start of an embedded link to PP, as per 3.11.6.  */
+
+static void
+sarif_begin_embedded_link (pretty_printer *pp)
+{
+  pp_character (pp, '[');
+}
+
+/* Print the end of an embedded link to PP, as per 3.11.6.  */
+
+static void
+sarif_end_embedded_link (pretty_printer *pp,
+                        const char *url)
+{
+  pp_string (pp, "](");
+  /* TODO: does the URI need escaping?
+     See https://github.com/oasis-tcs/sarif-spec/issues/657 */
+  pp_string (pp, url);
+  pp_character (pp, ')');
+}
+
+/* class sarif_token_printer : public token_printer.  */
+
+/* Implementation of pretty_printer::token_printer for SARIF output.
+   Emit URLs as per 3.11.6 ("Messages with embedded links").  */
+
+void
+sarif_builder::sarif_token_printer::print_tokens (pretty_printer *pp,
+                                                 const pp_token_list &tokens)
+{
+  /* Convert to text, possibly with colorization, URLs, etc.  */
+  label_text current_url;
+  for (auto iter = tokens.m_first; iter; iter = iter->m_next)
+    switch (iter->m_kind)
+      {
+      default:
+       gcc_unreachable ();
+
+      case pp_token::kind::text:
+       {
+         const pp_token_text *sub = as_a <const pp_token_text *> (iter);
+         const char * const str = sub->m_value.get ();
+         if (current_url.get ())
+           {
+             /* Write iter->m_value, but escaping any
+                escaped link characters as per 3.11.6.  */
+             for (const char *ptr = str; *ptr; ptr++)
+               {
+                 const char ch = *ptr;
+                 switch (ch)
+                   {
+                   default:
+                     pp_character (pp, ch);
+                     break;
+                   case '\\':
+                   case '[':
+                   case ']':
+                     pp_character (pp, '\\');
+                     pp_character (pp, ch);
+                     break;
+                   }
+               }
+           }
+         else
+           /* TODO: is other escaping needed? (e.g. of '[')
+              See https://github.com/oasis-tcs/sarif-spec/issues/658 */
+           pp_string (pp, str);
+       }
+       break;
+
+      case pp_token::kind::begin_color:
+      case pp_token::kind::end_color:
+       /* These are no-ops.  */
+       break;
+
+      case pp_token::kind::begin_quote:
+       pp_begin_quote (pp, pp_show_color (pp));
+       break;
+      case pp_token::kind::end_quote:
+       pp_end_quote (pp, pp_show_color (pp));
+       break;
+
+      /* Emit URLs as per 3.11.6 ("Messages with embedded links").  */
+      case pp_token::kind::begin_url:
+       {
+         pp_token_begin_url *sub = as_a <pp_token_begin_url *> (iter);
+         sarif_begin_embedded_link (pp);
+         current_url = std::move (sub->m_value);
+       }
+       break;
+      case pp_token::kind::end_url:
+       gcc_assert (current_url.get ());
+       sarif_end_embedded_link (pp, current_url.get ());
+       current_url = label_text::borrow (nullptr);
+       break;
+
+      case pp_token::kind::event_id:
+       {
+         pp_token_event_id *sub = as_a <pp_token_event_id *> (iter);
+         gcc_assert (sub->m_event_id.known_p ());
+         const sarif_code_flow *code_flow
+           = m_builder.get_code_flow_for_event_ids ();
+         label_text url = make_sarif_url_for_event (code_flow,
+                                                    sub->m_event_id);
+         if (url.get ())
+           sarif_begin_embedded_link (pp);
+         pp_character (pp, '(');
+         pp_decimal_int (pp, sub->m_event_id.one_based ());
+         pp_character (pp, ')');
+         if (url.get ())
+           sarif_end_embedded_link (pp, url.get ());
+       }
+       break;
+      }
+}
+
 /* Populate CONTEXT in preparation for SARIF output (either to stderr, or
    to a file).  */
 
 static void
-diagnostic_output_format_init_sarif (diagnostic_context &context)
+diagnostic_output_format_init_sarif (diagnostic_context &context,
+                                    std::unique_ptr<sarif_output_format> fmt)
 {
   /* Suppress normal textual path output.  */
   context.set_path_format (DPF_NONE);
@@ -3023,6 +3335,10 @@ diagnostic_output_format_init_sarif (diagnostic_context 
&context)
   /* Don't colorize the text.  */
   pp_show_color (context.printer) = false;
   context.set_show_highlight_colors (false);
+
+  context.printer->set_token_printer
+    (&fmt->get_builder ().get_token_printer ());
+  context.set_output_format (fmt.release ());
 }
 
 /* Populate CONTEXT in preparation for SARIF output to stderr.  */
@@ -3034,13 +3350,13 @@ diagnostic_output_format_init_sarif_stderr 
(diagnostic_context &context,
                                            bool formatted)
 {
   gcc_assert (line_maps);
-  diagnostic_output_format_init_sarif (context);
-  context.set_output_format
-    (new sarif_stream_output_format (context,
-                                    line_maps,
-                                    main_input_filename_,
-                                    formatted,
-                                    stderr));
+  diagnostic_output_format_init_sarif
+    (context,
+     ::make_unique<sarif_stream_output_format> (context,
+                                               line_maps,
+                                               main_input_filename_,
+                                               formatted,
+                                               stderr));
 }
 
 /* Populate CONTEXT in preparation for SARIF output to a file named
@@ -3054,13 +3370,13 @@ diagnostic_output_format_init_sarif_file 
(diagnostic_context &context,
                                          const char *base_file_name)
 {
   gcc_assert (line_maps);
-  diagnostic_output_format_init_sarif (context);
-  context.set_output_format
-    (new sarif_file_output_format (context,
-                                  line_maps,
-                                  main_input_filename_,
-                                  formatted,
-                                  base_file_name));
+  diagnostic_output_format_init_sarif
+    (context,
+     ::make_unique<sarif_file_output_format> (context,
+                                             line_maps,
+                                             main_input_filename_,
+                                             formatted,
+                                             base_file_name));
 }
 
 /* Populate CONTEXT in preparation for SARIF output to STREAM.  */
@@ -3073,13 +3389,13 @@ diagnostic_output_format_init_sarif_stream 
(diagnostic_context &context,
                                            FILE *stream)
 {
   gcc_assert (line_maps);
-  diagnostic_output_format_init_sarif (context);
-  context.set_output_format
-    (new sarif_stream_output_format (context,
-                                    line_maps,
-                                    main_input_filename_,
-                                    formatted,
-                                    stream));
+  diagnostic_output_format_init_sarif
+    (context,
+     ::make_unique<sarif_stream_output_format> (context,
+                                               line_maps,
+                                               main_input_filename_,
+                                               formatted,
+                                               stream));
 }
 
 #if CHECKING_P
@@ -3095,13 +3411,12 @@ class test_sarif_diagnostic_context : public 
test_diagnostic_context
 public:
   test_sarif_diagnostic_context (const char *main_input_filename)
   {
-    diagnostic_output_format_init_sarif (*this);
-
-    m_format = new buffered_output_format (*this,
-                                          line_table,
-                                          main_input_filename,
-                                          true);
-    set_output_format (m_format); // give ownership;
+    auto format = ::make_unique<buffered_output_format> (*this,
+                                                        line_table,
+                                                        main_input_filename,
+                                                        true);
+    m_format = format.get (); // borrowed
+    diagnostic_output_format_init_sarif (*this, std::move (format));
   }
 
   std::unique_ptr<sarif_log> flush_to_object ()
@@ -3175,7 +3490,7 @@ test_make_location_object (const line_table_case &case_)
   richloc.add_range (field, SHOW_RANGE_WITHOUT_CARET, &label2);
   richloc.set_escape_on_output (true);
 
-  sarif_result result;
+  sarif_result result (0);
 
   std::unique_ptr<sarif_location> location_obj
     = builder.make_location_object
@@ -3471,6 +3786,119 @@ test_simple_log_2 (const line_table_case &case_)
   }
 }
 
+/* Assuming that a single diagnostic has been emitted within
+   LOG, get a json::object for the result object.  */
+
+static const json::object *
+get_result_from_log (const sarif_log *log)
+{
+  auto runs = EXPECT_JSON_OBJECT_WITH_ARRAY_PROPERTY (log, "runs"); // 3.13.4
+  ASSERT_EQ (runs->size (), 1);
+
+  // 3.14 "run" object:
+  auto run = (*runs)[0];
+
+  // 3.14.23:
+  auto results = EXPECT_JSON_OBJECT_WITH_ARRAY_PROPERTY (run, "results");
+  ASSERT_EQ (results->size (), 1);
+
+  // 3.27 "result" object:
+  auto result = (*results)[0];
+  return expect_json_object (SELFTEST_LOCATION, result);
+}
+
+/* Assuming that a single diagnostic has been emitted to
+   DC, get a json::object for the messsage object within
+   the result.  */
+
+static const json::object *
+get_message_from_log (const sarif_log *log)
+{
+  auto result_obj = get_result_from_log (log);
+
+  // 3.27.11:
+  auto message_obj
+    = EXPECT_JSON_OBJECT_WITH_OBJECT_PROPERTY (result_obj, "message");
+  return message_obj;
+}
+
+/* Tests of messages with embedded links; see SARIF v2.1.0 3.11.6.  */
+
+static void
+test_message_with_embedded_link ()
+{
+  auto_fix_quotes fix_quotes;
+  {
+    test_sarif_diagnostic_context dc ("test.c");
+    rich_location richloc (line_table, UNKNOWN_LOCATION);
+    dc.report (DK_ERROR, richloc, nullptr, 0,
+              "before %{text%} after",
+              "http://example.com";);
+    std::unique_ptr<sarif_log> log = dc.flush_to_object ();
+
+    auto message_obj = get_message_from_log (log.get ());
+    ASSERT_JSON_STRING_PROPERTY_EQ
+      (message_obj, "text",
+       "before [text](http://example.com) after");
+  }
+
+  /* Escaping in message text.
+     This is "EXAMPLE 1" from 3.11.6.  */
+  {
+    test_sarif_diagnostic_context dc ("test.c");
+    rich_location richloc (line_table, UNKNOWN_LOCATION);
+
+    /* Disable "unquoted sequence of 2 consecutive punctuation
+       characters `]\' in format" warning.  */
+#if __GNUC__ >= 10
+#  pragma GCC diagnostic push
+#  pragma GCC diagnostic ignored "-Wformat-diag"
+#endif
+    dc.report (DK_ERROR, richloc, nullptr, 0,
+              "Prohibited term used in %{para[0]\\spans[2]%}.",
+              "1");
+#if __GNUC__ >= 10
+#  pragma GCC diagnostic pop
+#endif
+
+    std::unique_ptr<sarif_log> log = dc.flush_to_object ();
+
+    auto message_obj = get_message_from_log (log.get ());
+    ASSERT_JSON_STRING_PROPERTY_EQ
+      (message_obj, "text",
+       "Prohibited term used in [para\\[0\\]\\\\spans\\[2\\]](1).");
+    /* This isn't exactly what EXAMPLE 1 of the spec has; reported as
+       https://github.com/oasis-tcs/sarif-spec/issues/656  */
+  }
+
+  /* Urlifier.  */
+  {
+    class test_urlifier : public urlifier
+    {
+    public:
+      char *
+      get_url_for_quoted_text (const char *p, size_t sz) const final override
+      {
+       if (!strncmp (p, "-foption", sz))
+         return xstrdup ("http://example.com";);
+       return nullptr;
+      }
+    };
+
+    test_sarif_diagnostic_context dc ("test.c");
+    dc.set_urlifier (new test_urlifier ());
+    rich_location richloc (line_table, UNKNOWN_LOCATION);
+    dc.report (DK_ERROR, richloc, nullptr, 0,
+              "foo %<-foption%> %<unrecognized%> bar");
+    std::unique_ptr<sarif_log> log = dc.flush_to_object ();
+
+    auto message_obj = get_message_from_log (log.get ());
+    ASSERT_JSON_STRING_PROPERTY_EQ
+      (message_obj, "text",
+       "foo `[-foption](http://example.com)' `unrecognized' bar");
+  }
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -3479,6 +3907,7 @@ diagnostic_format_sarif_cc_tests ()
   for_each_line_table_case (test_make_location_object);
   test_simple_log ();
   for_each_line_table_case (test_simple_log_2);
+  test_message_with_embedded_link ();
 }
 
 } // namespace selftest
diff --git a/gcc/pretty-print-format-impl.h b/gcc/pretty-print-format-impl.h
index cffdd461a33d..f1996284f4a1 100644
--- a/gcc/pretty-print-format-impl.h
+++ b/gcc/pretty-print-format-impl.h
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_PRETTY_PRINT_FORMAT_IMPL_H
 
 #include "pretty-print.h"
+#include "diagnostic-event-id.h"
 
 /* A struct representing a pending item to be printed within
    pp_format.
@@ -31,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
    - begin/end named color
    - open/close quote
    - begin/end URL
+   - event IDs
    - custom data (for the formatter, for the pretty_printer,
      or the output format)
 
@@ -72,6 +74,8 @@ public:
     begin_url,
     end_url,
 
+    event_id,
+
     custom_data,
 
     NUM_KINDS
@@ -218,6 +222,34 @@ struct pp_token_end_url : public pp_token
   }
 };
 
+struct pp_token_event_id : public pp_token
+{
+  pp_token_event_id (diagnostic_event_id_t event_id)
+  : pp_token (kind::event_id),
+    m_event_id (event_id)
+  {
+    gcc_assert (event_id.known_p ());
+  }
+
+  diagnostic_event_id_t m_event_id;
+};
+
+template <>
+template <>
+inline bool
+is_a_helper <pp_token_event_id *>::test (pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::event_id;
+}
+
+template <>
+template <>
+inline bool
+is_a_helper <const pp_token_event_id *>::test (const pp_token *tok)
+{
+  return tok->m_kind == pp_token::kind::event_id;
+}
+
 struct pp_token_custom_data : public pp_token
 {
   class value
diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index d2c0a197680c..fe6b6090f323 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -1120,6 +1120,16 @@ pp_token::dump (FILE *out) const
     case kind::end_url:
       fprintf (out, "END_URL");
       break;
+
+    case kind::event_id:
+      {
+       const pp_token_event_id *sub
+         = as_a <const pp_token_event_id *> (this);
+       gcc_assert (sub->m_event_id.known_p ());
+       fprintf (out, "EVENT((%i))", sub->m_event_id.one_based ());
+      }
+      break;
+
     case kind::custom_data:
       {
        const pp_token_custom_data *sub
@@ -1984,12 +1994,7 @@ pretty_printer::format (text_info *text)
            diagnostic_event_id_ptr event_id
              = va_arg (*text->m_args_ptr, diagnostic_event_id_ptr);
            gcc_assert (event_id->known_p ());
-
-           pp_string (this, colorize_start (m_show_color, "path"));
-           pp_character (this, '(');
-           pp_decimal_int (this, event_id->one_based ());
-           pp_character (this, ')');
-           pp_string (this, colorize_stop (m_show_color));
+           formatted_tok_list->push_back<pp_token_event_id> (*event_id);
          }
          break;
 
@@ -2183,6 +2188,18 @@ default_token_printer (pretty_printer *pp,
        pp_end_url (pp);
        break;
 
+      case pp_token::kind::event_id:
+       {
+         pp_token_event_id *sub = as_a <pp_token_event_id *> (iter);
+         gcc_assert (sub->m_event_id.known_p ());
+         pp_string (pp, colorize_start (pp_show_color (pp), "path"));
+         pp_character (pp, '(');
+         pp_decimal_int (pp, sub->m_event_id.one_based ());
+         pp_character (pp, ')');
+         pp_string (pp, colorize_stop (pp_show_color (pp)));
+       }
+       break;
+
       case pp_token::kind::custom_data:
        /* These should have been eliminated by replace_custom_tokens.  */
        gcc_unreachable ();
diff --git a/gcc/testsuite/gcc.dg/sarif-output/bad-pragma.c 
b/gcc/testsuite/gcc.dg/sarif-output/bad-pragma.c
new file mode 100644
index 000000000000..db274de4b97a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sarif-output/bad-pragma.c
@@ -0,0 +1,16 @@
+/* Verify that SARIF output can capture URLs in diagnostics
+   related to a bad pragma.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-format=sarif-file -Wpragmas" } */
+
+#pragma GCC diagnostic ignored "-Wmisleading-indenttion"
+
+int nonempty;
+
+/* Verify that some JSON was written to a file with the expected name:
+   { dg-final { verify-sarif-file } } */
+
+/* Use a Python script to verify various properties about the generated
+   .sarif file:
+   { dg-final { run-sarif-pytest bad-pragma.c "test-bad-pragma.py" } } */
diff --git a/gcc/testsuite/gcc.dg/sarif-output/test-bad-pragma.py 
b/gcc/testsuite/gcc.dg/sarif-output/test-bad-pragma.py
new file mode 100644
index 000000000000..140bb3381984
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sarif-output/test-bad-pragma.py
@@ -0,0 +1,38 @@
+from sarif import *
+
+import pytest
+
+@pytest.fixture(scope='function', autouse=True)
+def sarif():
+    return sarif_from_env()
+
+def test_messages_have_embedded_urls(sarif):
+    runs = sarif['runs']
+    run = runs[0]
+    results = run['results']
+
+    # We expect a single warning with a secondary location.
+    #
+    # The textual form of the diagnostic would look like this:
+    #  . PATH/bad-pragma.c:7:32: warning: unknown option after '#pragma GCC 
diagnostic' kind [-Wpragmas]
+    #  .     7 | #pragma GCC diagnostic ignored "-Wmisleading-indenttion"
+    #  .       |                                ^~~~~~~~~~~~~~~~~~~~~~~~~
+    #  . PATH/bad-pragma.c:7:32: note: did you mean '-Wmisleading-indentation'?
+    assert len(results) == 1
+    
+    result = results[0]
+    assert result['ruleId'] == '-Wpragmas'
+    assert result['level'] == 'warning'
+    assert result['message']['text'] \
+        == "unknown option after '[#pragma GCC 
diagnostic](https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html)' kind"
+    # Note how we expect an embedded link in the above for the docs for 
#pragma GCC diagnostic
+    
+    # We expect one related location, for the note.
+    relatedLocations = result['relatedLocations']
+    assert len(relatedLocations) == 1
+
+    rel_loc = relatedLocations[0]
+    assert rel_loc['message']['text'] \
+        == "did you mean 
'[-Wmisleading-indentation](https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmisleading-indentation)'?"
+    # Again, we expect an embedded link in the above, this time to the
+    # docs for the suggested option
diff --git a/gcc/testsuite/gcc.dg/sarif-output/test-include-chain-2.py 
b/gcc/testsuite/gcc.dg/sarif-output/test-include-chain-2.py
index 761fe1b59a9c..843f89a94853 100644
--- a/gcc/testsuite/gcc.dg/sarif-output/test-include-chain-2.py
+++ b/gcc/testsuite/gcc.dg/sarif-output/test-include-chain-2.py
@@ -96,9 +96,11 @@ def test_location_relationships(sarif):
         == "  __builtin_free (ptr); // 1st\n"
     assert threadFlow['locations'][0]['kinds'] == ['release', 'memory']
     assert threadFlow['locations'][0]['executionOrder'] == 1
-    
+
+    # We should have an embedded link in this event's message to the
+    # other event's location within the SARIF file:
     assert threadFlow['locations'][1]['location']['message']['text'] \
-        == "second 'free' here; first 'free' was at (1)"
+        == "second 'free' here; first 'free' was at 
[(1)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/0)"
     assert 
threadFlow['locations'][1]['location']['physicalLocation']['contextRegion']['snippet']['text']
 \
         == "  __builtin_free (ptr); // 2nd\n"
     assert threadFlow['locations'][1]['kinds'] == ['danger']
-- 
2.26.3

Reply via email to