llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-clang Author: Balázs Benics (steakhal) <details> <summary>Changes</summary> Depend on #<!-- -->184421 --- Patch is 23.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/184833.diff 8 Files Affected: - (removed) clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst (-81) - (added) clang/docs/ScalableStaticAnalysisFramework/developer-docs/ForceLinkerHeaders.rst (+148) - (added) clang/docs/ScalableStaticAnalysisFramework/developer-docs/HowToExtend.rst (+218) - (added) clang/docs/ScalableStaticAnalysisFramework/developer-docs/SummaryExtractionInternals.rst (+31) - (added) clang/docs/ScalableStaticAnalysisFramework/developer-docs/index.rst (+10) - (renamed) clang/docs/ScalableStaticAnalysisFramework/index.rst (+9-3) - (added) clang/docs/ScalableStaticAnalysisFramework/user-docs/SummaryExtraction.rst (+33) - (modified) clang/docs/index.rst (+1-1) ``````````diff diff --git a/clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst b/clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst deleted file mode 100644 index 6b9c7db5bc048..0000000000000 --- a/clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst +++ /dev/null @@ -1,81 +0,0 @@ -================== -Summary Extraction -================== - -.. WARNING:: The framework is rapidly evolving. - The documentation might be out-of-sync of the implementation. - The purpose of this documentation to give context for upcoming reviews. - - -The simplest way to think about the lifetime of a summary extraction is by following the handlers of the ``FrontendAction`` implementing it. -There are 3 APIs that are important for us, that are invoked in this order: - - - ``BeginInvocation()``: Checks the command-line arguments related to summary extraction. - - ``CreateASTConsumer()``: Creates the ASTConsumers for the different summary extractors. - - ``EndSourceFile()``: Serializes and writes the extracted summaries. - -Implementation details -********************** - -Global Registries -================= - -The framework uses `llvm::Registry\<\> <https://llvm.org/doxygen/classllvm_1_1Registry.html>`_ -as an extension point for adding new summary analyses or serialization formats. -Each entry in the *registry* holds a name, a description and a pointer to a constructor. - -**Pros**: - - - Decentralizes the registration. There is not a single place in the source code where we spell out all of the analyses/formats. - - Plays nicely with downstream extensibility, as downstream users can add their own analyses/formats without touching the source code of the framework; while still benefiting from the upstream-provided analyses/formats. - - Works with static and dynamic linking. In other words, plugins as shared objects compose naturally. - -**Cons**: - - - Registration slows down all ``clang`` users by a tiny amount, even if they don't invoke the summary extraction framework. - - As the registration is now decoupled, it's now a global program property; and potentially more difficult to reason about. - - Complicates testing. - -Example for adding a custom summary extraction ----------------------------------------------- - -.. code-block:: c++ - - //--- MyAnalysis.cpp - class MyAnalysis : public TUSummaryExtractor { - using TUSummaryExtractor::TUSummaryExtractor; - // Implementation... - }; - - static TUSummaryExtractorRegistry::Add<MyAnalysis> - RegisterExtractor("MyAwesomeAnalysis", "The analysis produces some awesome results"); - -Details of ``BeginInvocation()`` -================================ - -#. Processes the different fields populated from the command line. Ensure that mandatory flags are set, etc. -#. For each requested analysis, check if we have a matching ``TUSummaryExtractorInfo`` in the static registry, and diagnose if not. -#. Parse the format name, and check if we have a matching ``FormatInfo`` in the format registry. -#. Lastly, forward the ``BeginInvocation`` call to the wrapped FrontendAction. - - -Details of ``CreateASTConsumer()`` -================================== - -#. Create the wrapped ``FrontendAction`` consumers by calling ``CreateASTConsumer()`` on it. -#. Call ``ssaf::makeTUSummaryExtractor()`` on each requested analysis name. - - #. Look up in the *summary registry* the relevant *Info* object and call the ``Factory`` function pointer to create the relevant ``ASTConsumer``. - #. Remember, we pass a mutable ``TUSummaryBuilder`` reference to the constructor, so the analysis can create ``EntityID`` objects and map them to ``TUSummaryData`` objects in their implementation. Their custom metadata needs to inherit from ``TUSummaryData`` to achieve this. - -#. Lastly, add all of these ``ASTConsumers`` to the ``MultiplexConsumer`` and return that. - - -Details of ``EndSourceFile()`` -============================== - -#. Call the virtual ``writeTUSummary()`` on the serialization format, leading to the desired format handler (such as JSON or binary or something custom - provided by a plugin). - - #. Create the directory structure for the enabled analyses. - #. Serialize ``entities``, ``entity_linkage``, etc. Achieve by calling the matching virtual functions, dispatching to the concrete implementation. - #. The same goes for each enabled analysis, serialize the ``EntityID`` to ``TUSummaryData`` mapping using the analysis-provided ``Serialize`` function pointer. diff --git a/clang/docs/ScalableStaticAnalysisFramework/developer-docs/ForceLinkerHeaders.rst b/clang/docs/ScalableStaticAnalysisFramework/developer-docs/ForceLinkerHeaders.rst new file mode 100644 index 0000000000000..16ca0ea714d69 --- /dev/null +++ b/clang/docs/ScalableStaticAnalysisFramework/developer-docs/ForceLinkerHeaders.rst @@ -0,0 +1,148 @@ +==================== +Force-Linker Headers +==================== + +.. WARNING:: The framework is rapidly evolving. + The documentation might be out-of-sync with the implementation. + The purpose of this documentation is to give context for upcoming reviews. + +The problem +*********** + +SSAF uses `llvm::Registry\<\> <https://llvm.org/doxygen/classllvm_1_1Registry.html>`_ +for decentralized registration of summary extractors and serialization formats. +Each registration is a file-scope static object whose constructor adds an entry +to the global registry: + +.. code-block:: c++ + + // In MyExtractor.cpp + static TUSummaryExtractorRegistry::Add<MyExtractor> + RegisterExtractor("MyExtractor", "My summary extractor"); + +When the translation unit containing this static object is compiled into a +**static library** (``.a`` / ``.lib``), the static linker will only pull in +object files that resolve an undefined symbol in the consuming binary. +Because no code ever calls anything in ``MyExtractor.o`` directly, the linker +discards the object file — and the registration never runs. + +This is not a problem for **shared libraries** (``.so`` / ``.dylib``), because +the dynamic linker loads the entire shared object and runs all global +constructors unconditionally. + +The solution: anchor symbols +**************************** + +Each registration translation unit defines a ``volatile int`` **anchor symbol**: + +.. code-block:: c++ + + // In MyExtractor.cpp — next to the registry Add<> object + // NOLINTNEXTLINE(misc-use-internal-linkage) + volatile int SSAFMyExtractorAnchorSource = 0; + +A **force-linker header** declares the symbol as ``extern`` and reads it into a +``[[maybe_unused]] static int`` destination: + +.. code-block:: c++ + + // In SSAFBuiltinForceLinker.h + extern volatile int SSAFMyExtractorAnchorSource; + [[maybe_unused]] static int SSAFMyExtractorAnchorDestination = + SSAFMyExtractorAnchorSource; + +Any translation unit that ``#include``\s this header now has a reference to +``SSAFMyExtractorAnchorSource``, which forces the linker to pull in +``MyExtractor.o`` — and with it, the static ``Add<>`` registration object. + +The ``volatile`` qualifier is essential: without it the compiler could +constant-fold the ``0`` and eliminate the reference entirely. + +Header hierarchy +================ + +.. code-block:: text + + SSAFForceLinker.h (umbrella — include this in binaries) + └── SSAFBuiltinForceLinker.h (upstream built-in anchors only) + +- ``clang/include/clang/Analysis/Scalable/SSAFBuiltinForceLinker.h`` — anchors for + upstream-provided (built-in) extractors and formats (e.g. ``JSONFormat``). +- ``clang/include/clang/Analysis/Scalable/SSAFForceLinker.h`` — umbrella header + that includes ``SSAFBuiltinForceLinker.h``. This is the header that + downstream projects should modify to add their own force-linker includes + (see :doc:`HowToExtend`). + +Include the umbrella header with ``// IWYU pragma: keep`` in any translation +unit that must guarantee all registrations are active — typically the entry +point of a binary that uses ``clangAnalysisScalable``: + +.. code-block:: c++ + + // In ExecuteCompilerInvocation.cpp + #include "clang/Analysis/Scalable/SSAFForceLinker.h" // IWYU pragma: keep + +Naming convention +================= + +Anchor symbols follow the pattern ``SSAF<Component>AnchorSource`` and +``SSAF<Component>AnchorDestination``. For example: + +- ``SSAFJSONFormatAnchorSource`` / ``SSAFJSONFormatAnchorDestination`` +- ``SSAFMyExtractorAnchorSource`` / ``SSAFMyExtractorAnchorDestination`` + +Considered alternatives +*********************** + +``--whole-archive`` / ``-force_load`` +===================================== + +The linker can be instructed to include *every* object file from a static +library, regardless of whether any symbols are referenced: + +.. code-block:: bash + + # GNU ld / lld (Linux, BSD) + -Wl,--whole-archive -lclangAnalysisScalable -Wl,--no-whole-archive + + # Apple ld + -Wl,-force_load,libclangAnalysisScalable.a + +Since CMake 3.24, the ``$<LINK_LIBRARY:WHOLE_ARCHIVE,...>`` generator expression +provides a portable way to do the same: + +.. code-block:: cmake + + target_link_libraries(clang PRIVATE + "$<LINK_LIBRARY:WHOLE_ARCHIVE,clangAnalysisScalable>") + +**Why we did not choose this approach**: + +- It is a blunt instrument — *all* object files in the library are pulled in, + increasing binary size. +- The anchor approach only targets specific object files: only registrations + whose anchors are referenced in a force-linker header are pulled in. +- ``--whole-archive`` semantics vary across platforms and toolchains, requiring + platform-specific CMake logic or the relatively new ``WHOLE_ARCHIVE`` + generator expression. + +Explicit initialization functions +================================= + +An alternative is a central ``initializeSSAFRegistrations()`` function that +explicitly calls into each registration module: + +.. code-block:: c++ + + void initializeSSAFRegistrations() { + initializeJSONFormat(); + initializeMyExtractor(); + // ... one entry per registration + } + +**Why we did not choose this approach**: + +- It reintroduces a centralized list that must be maintained manually, defeating + the decoupled-registration benefit of ``llvm::Registry``. +- Adding a new extractor or format requires modifying a central file, which + increases merge-conflict risk for downstream users. diff --git a/clang/docs/ScalableStaticAnalysisFramework/developer-docs/HowToExtend.rst b/clang/docs/ScalableStaticAnalysisFramework/developer-docs/HowToExtend.rst new file mode 100644 index 0000000000000..840bca171fb48 --- /dev/null +++ b/clang/docs/ScalableStaticAnalysisFramework/developer-docs/HowToExtend.rst @@ -0,0 +1,218 @@ +=========================== +How to Extend the Framework +=========================== + +.. WARNING:: The framework is rapidly evolving. + The documentation might be out-of-sync with the implementation. + The purpose of this documentation is to give context for upcoming reviews. + +SSAF is designed to be extensible with new **summary extractors** and **serialization formats**. +Extensions can be added in three ways: + +#. **Statically, in-tree** — built as part of the upstream LLVM/Clang tree. +#. **Statically, out-of-tree (downstream)** — built in a downstream fork or project that links ``clangAnalysisScalable`` as a static library. +#. **Dynamically, via plugins** — loaded at runtime as shared objects. + +All three approaches use the same ``llvm::Registry``-based registration mechanism. +The key difference is how the linker sees the registration: +static libraries need :doc:`force-linker anchors <ForceLinkerHeaders>` to prevent dead-stripping, while shared libraries do not. + +Adding a summary extractor +************************** + +A summary extractor is an ``ASTConsumer`` that inspects the AST and populates a ``TUSummary`` via the ``TUSummaryBuilder`` interface. + +Step 1: Implement the extractor +=============================== + +.. code-block:: c++ + + //--- MyExtractor.h + #include "clang/Analysis/Scalable/TUSummary/TUSummaryExtractor.h" + + namespace clang::ssaf { + + class MyExtractor : public TUSummaryExtractor { + public: + using TUSummaryExtractor::TUSummaryExtractor; + + // Override HandleTranslationUnit or any other virtual functions of an ASTConsumer... + // Use the SummaryBuilder to populate the summary while walking the AST. + }; + + } // namespace clang::ssaf + +Step 2: Register the extractor +============================== + +.. code-block:: c++ + + //--- MyExtractor.cpp + #include "MyExtractor.h" + #include "clang/Analysis/Scalable/TUSummary/ExtractorRegistry.h" + + using namespace clang::ssaf; + + // NOLINTNEXTLINE(misc-use-internal-linkage) + volatile int SSAFMyExtractorAnchorSource = 0; + + static TUSummaryExtractorRegistry::Add<MyExtractor> + RegisterExtractor("MyExtractor", "My awesome summary extractor"); + +The ``"MyExtractor"`` string is the name users pass to ``--ssaf-extract-summaries=MyExtractor``. + +Step 3: Add the force-linker anchor +=================================== + +See :doc:`ForceLinkerHeaders` for a full explanation of why this is needed. +Add the following to the appropriate force-linker header: + +.. code-block:: c++ + + extern volatile int SSAFMyExtractorAnchorSource; + [[maybe_unused]] static int SSAFMyExtractorAnchorDestination = + SSAFMyExtractorAnchorSource; + +For **in-tree** additions, add this to +``clang/include/clang/Analysis/Scalable/SSAFBuiltinForceLinker.h``. + +For **downstream** additions, see `Out-of-tree (downstream) extensions`_ below. + + +Adding a serialization format +***************************** + +A serialization format controls how the ``TUSummary`` is written to (and read from) disk. +This involves more boilerplate than an extractor because each format has a per-analysis ``FormatInfo`` sub-registry. + +Step 1: Define the format class +=============================== + +Your format class must inherit from ``SerializationFormat`` and define a ``FormatInfo`` type alias: + +.. code-block:: c++ + + //--- MyFormat.h + #include "clang/Analysis/Scalable/Serialization/SerializationFormat.h" + #include "clang/Support/Compiler.h" + #include "llvm/Support/Registry.h" + + namespace clang::ssaf { + + class MyFormat : public SerializationFormat { + public: + // Define the type aliases: SerializerFn, DeserializerFn + using FormatInfo = FormatInfoEntry<SerializerFn, DeserializerFn>; + + // Override readTUSummaryEncoding, writeTUSummary, etc. + }; + + } // namespace clang::ssaf + + namespace llvm { + extern template class CLANG_TEMPLATE_ABI + Registry<clang::ssaf::MyFormat::FormatInfo>; + } // namespace llvm + +Step 2: Register the format +=========================== + +.. code-block:: c++ + + //--- MyFormat.cpp + #include "MyFormat.h" + #include "clang/Analysis/Scalable/Serialization/SerializationFormatRegistry.h" + + using namespace clang::ssaf; + + // NOLINTNEXTLINE(misc-use-internal-linkage) + volatile int SSAFMyFormatAnchorSource = 0; + + static SerializationFormatRegistry::Add<MyFormat> + RegisterFormat("myformat", "My awesome serialization format"); + + LLVM_INSTANTIATE_REGISTRY(llvm::Registry<MyFormat::FormatInfo>) + +The format name (``"myformat"``) is matched against the file extension in ``--ssaf-tu-summary-file=output.myformat``. + +Step 3: Register per-analysis FormatInfo entries +================================================ + +For each analysis that should be serializable in your format, register a ``FormatInfo`` entry. +``FormatInfo`` must be implemented for any of the summaries that wants to support ``myformat``: + +.. code-block:: c++ + + namespace { + using FormatInfo = MyFormat::FormatInfo; + struct MyAnalysisFormatInfo final : FormatInfo { + MyAnalysisFormatInfo() : FormatInfo{ + SummaryName("MyAnalysis"), + serializeMyAnalysis, + deserializeMyAnalysis, + } {} + }; + } // namespace + + static llvm::Registry<FormatInfo>::Add<MyAnalysisFormatInfo> + RegisterFormatInfo("MyAnalysisFormatInfo", + "MyFormat format info for MyAnalysis"); + +Step 4: Add the force-linker anchor +=================================== + +Same pattern as for extractors — see `Adding a summary extractor`_ Step 3, and :doc:`ForceLinkerHeaders`. + + +Static extensibility +******************** + +In-tree extensions +================== + +For extensions that are part of the upstream LLVM/Clang tree: + +#. Add the anchor to ``clang/include/clang/Analysis/Scalable/SSAFBuiltinForceLinker.h``. +#. Add the source files to the ``clangAnalysisScalable`` CMake library target. +#. That's it — the ``SSAFForceLinker.h`` umbrella includes ``SSAFBuiltinForceLinker.h`` + transitively, so any binary that includes the umbrella will pull in the registration. + +Out-of-tree (downstream) extensions +=================================== + +Downstream projects that maintain a fork can add their own extensions without +modifying upstream files — reducing the risk of merge-conflicts: + +#. Create a downstream force-linker header, e.g. ``SSAFDownstreamForceLinker.h``, + containing the anchor references for downstream-only extractors and formats. +#. Include it from ``SSAFForceLinker.h`` (the umbrella): + + .. code-block:: c++ + + // In SSAFForceLinker.h + #include "SSAFBuiltinForceLinker.h" // IWYU pragma: keep + #include "SSAFDownstreamForceLinker.h" // IWYU pragma: keep + + This is a single-line addition per downstream project, minimizing conflicts with upstream changes. + Upstream will try to avoid modifying this umbrella header, making it a stable static extension point. + +#. Add the downstream source files to the build system as usual. + + +Dynamic extensibility (plugins) +******************************* + +Shared libraries loaded at runtime — via ``dlopen`` / ``LoadLibrary`` or the +Clang plugin mechanism — do **not** need force-linker anchors, but having them also does not hurt. + +When a shared object (``.so`` / ``.dylib``) is loaded, the dynamic linker runs all global constructors in that library unconditionally. +This means the ``llvm::Registry::Add<>`` objects execute their constructors and register themselves automatically. + +To use a plugin: + +#. Build your extractor or format as a shared library. +#. Load it with the Clang plugin mechanism (``-fplugin=`` or ``-load``). +#. Pass the extractor name to ``--ssaf-extract-summaries=`` as usual. + +No changes to any force-linker header are required. +The ``llvm::Registry`` infrastructure handles everything once the shared object is loaded. diff --git a/clang/docs/ScalableStaticAnalysisFramework/developer-docs/SummaryExtractionInternals.rst b/clang/docs/ScalableStaticAnalysisFramework/developer-docs/SummaryExtractionInternals.rst new file mode 100644 index 0000000000000..8190f2c8c7fae --- /dev/null +++ b/clang/docs/ScalableStaticAnalysisFramework/developer-docs/SummaryExtractionInternals.rst @@ -0,0 +1,31 @@ +============================ +Summary Extraction Internals +============================ + +.. WARNING:: The framework is rapidly evolving. + The documentation might be out-of-sync with the implementation. + The purpose of this documentation is to give context for upcoming reviews. + +When ``--ssaf-tu-summary-file=`` is non-empty, ``CreateFrontendAction()`` (in ``ExecuteCompilerInvocation.cpp``) +wraps the original ``FrontendAction`` inside a ``TUSummaryExtractorFrontendAction``. +This ensures that the summary extraction transparently happens after the original frontend action, which is usually either compilation (``-c``) or just ``-fsyntax-only`` in tests. + +Lifetime of a summary extraction +******************************** + +The ``TUSummaryExtractorFrontendAction`` will try to construct a ``TUSummaryRunner`` ASTConsumer and report an error on failure. +When it succeeds, it will multiplex the handlers of the ASTConsumer to every summary extractor and in the end, serialize and write the results to the desired file. + +Implementation details +********************** + +Global Registries +================= + +The framework uses `llvm::Registry\<\> <https://llvm.org/doxygen/classllvm_1_1Registry.html>`_ +as an extension point for adding new summary analyses or serialization formats. +Each entry in the *registry* holds a name, a description and a pointer to a constructor. +Because static linking can discard unreferenced registration objec... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/184833 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
