NoQ created this revision.
NoQ added reviewers: aaron.ballman, gribozavr2, xazax.hun, ldionne, jkorous, 
t-rasmud, ziqingluo-90, malavikasamak.
Herald added subscribers: steakhal, martong, rnkovacs.
Herald added a project: All.
NoQ requested review of this revision.

This document accompanies the LLVM Discourse discussion in 
https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734 as @aaron.ballman 
recommended writing it up early. It's intended to eventually become the 
user-facing documentation for the tool.

As of now the document roughly represents our current plan; it may be corrected 
in the future, which may or may not be a result of our discussion here! And 
there are a few TODOs left for me to fill in.

Of course, separate documentation will need to be provided for individual 
items, such as attribute documentation, but that's a completely different story.


Repository:
  rC Clang

https://reviews.llvm.org/D136811

Files:
  clang/docs/SafeBuffers.rst

Index: clang/docs/SafeBuffers.rst
===================================================================
--- /dev/null
+++ clang/docs/SafeBuffers.rst
@@ -0,0 +1,401 @@
+================
+C++ Safe Buffers
+================
+
+.. contents::
+   :local:
+
+
+Introduction
+============
+
+Clang can be used to harden your C++ code against buffer overflows, an otherwise
+common security issue with C-based languages.
+
+The solution described in this document is an integrated programming model:
+it defines safety guidelines that restrict operations the code is allowed
+to perform, it provides tools for updating the code to conform to these
+guidelines, and it provides runtime mitigations for the situations when
+following guidelines isn’t sufficient.
+
+Namely, the solution consists of the following parts:
+
+  - ``-Wunsafe-buffer-usage`` is a family of warnings that warns you when
+    a potentially unsafe operation is performed in your code;
+  - Pragmas ``unsafe_buffer_usage`` and ``only_safe_buffers`` and allow you to
+    annotate sections of code as either already-hardened or not-to-be-hardened,
+    enabling incremental adoption and and providing “escape hatches”
+    when unsafety is necessary;
+  - Automatic fixits provided by the warning act as a modernizer to help you
+    convert large amounts of old code to conform to the warning;
+  - Attribute ``[[unsafe_buffer_usage]]`` lets you annotate custom functions as
+    unsafe, while providing a safe alternative that can often be suggested by
+    the compiler automatically.
+  - LLVM’s own C++ standard library implementation, libc++, provides a
+    hardened mode where C++ classes such as ``std::vector`` and ``std::span``,
+    together with their respective ``iterator`` classes, are protected
+    at runtime against buffer overflows. This changes buffer overflows from
+    extremely dangerous undefined behavior to predictable runtime crash.
+  - Finally, in order to avoid bugs in newly converted code, the
+    Clang static analyzer provides a checker to find misconstructed
+    ``std::span`` objects.
+
+Note that some of these partial solutions are useful on their own. For example,
+hardened libc++ can be used as a “sanitizer” to find buffer overflow bugs in
+existing code. The static analyzer checker is also universally useful,
+not tied to this programming model. The warning may be desired independently,
+even in plain C code, if you’re hard-bent on isolating, annotating and auditing
+all buffer-related code in your codebase. That said, some of these guidelines
+don’t cause your code to mitigate vulnerabilities better unless hardened
+containers are in use to carry out runtime checks.
+
+
+Why Not Just Harden All Pointer Operations?
+===========================================
+
+At a glance it seems reasonable to ask the compiler to make raw pointer
+operations crash on out-of-bounds accesses at runtime. And in case of C, this
+may be the only way to achieve safety. The main problem with such solution lies
+in dealing with the fact that most pointers in the code don’t have bounds
+information associated with them. Adding such information to the code
+in order to allow the compiler to perform bounds checks at runtime would result
+either in a system of attributes significantly more advanced than
+the current solution, or a significant change in runtime representation
+of all pointers.
+
+Fortunately, C++ already has a standard solution for passing bounds information
+alongside the pointer, the ``std::span``. With the help of safe standard
+containers such as ``std::span``, you can achieve bounds safety by
+*simply writing good modern C++ code*. This is what this solution is designed
+to help your codebase with.
+
+
+The Guidelines
+==============
+
+Let’s start with how your code would need to look like to comply with
+the programming model.
+
+1. When you allocate a buffer, allocate a container instead. For example,
+   if you find yourself allocating a fixed-size array, use ``std::array``.
+   If you need to allocate a heap buffer of variable length, ``std::vector``
+   can often act as a drop-in replacement. A vector can do a lot more than that,
+   but you can always resort to a fill-constructor that preallocates the buffer
+   of necessary size, and then never use any resizing operations. This gives you
+   a simple safe buffer at no extra cost other than the cost of safety. Another
+   good solution is ``std::span`` which you can initialize with a heap pointer
+   allocated with ``new[]``; it doesn’t assume unique ownership so you can copy
+   such span if you want, just like a raw pointer, but of course you’ll have to
+   manually ``delete[]`` it when appropriate.
+   (TODO: What about things like ``std::shared_ptr<T []>``)?
+2. When you deal with function or class interfaces that accept or return
+   buffers, either pass the original container – move it or pass by reference or
+   use a smart pointer – or pass a view into that container, such as
+   ``std::span``, which helps you deal with different container classes
+   uniformly.
+3. If you deal with interfaces you cannot change, such as an interface of
+   a third-party library you’re using, and these interfaces require you to pass
+   buffers as raw pointers, make sure you “containerize” these buffers
+   *as soon as possible*. For example, if the library returns a raw buffer
+   pointer, put it into ``std::span`` in order to immediately write down
+   the size of that buffer for the purposes of future bounds checks.
+   Then keep such buffers contained this way *for as long as possible*,
+   especially when passing across function boundaries. Say, until you need to
+   pass it back to that library, you can pass ``std::span`` by value between
+   your functions to preserve precise size information naturally.
+4. Sometimes you will find yourself implementing a custom container.
+   Say, ``std::vector`` or ``std::span`` may turn out to be poorly suitable
+   for your needs, and this is fine. In such cases the same guidelines apply.
+   You may have to use the opt-out pragma on the implementation of
+   the container – that’s exactly their purpose! Additionally, consider
+   implementing runtime checks in your container similar to the ones already
+   present in hardened libc++, because following the guidelines alone is often
+   insufficient without such hardening.
+   (TODO: Will automatic fixits be able to suggest custom containers or views?)
+   (TODO: Explain how to implement such checks in a custom container?)
+
+
+The Warning
+===========
+
+The warning ``-Wunsafe-buffer-usage`` warns on the following operations:
+
+  - Array subscript expression on raw arrays or raw pointers,
+      - unless the index is a compile-time constant ``0``,
+  - Increment and decrement of a raw pointer with operators ``++`` and ``--``;
+  - Addition or subtraction of a number to/from a raw pointer with operators
+  ``+``, ``-``, ``+=``, ``-=``,
+      - unless that number is a compile time constant ``0``;
+      - subtraction between two pointers is also fine;
+  - Passing a pointer through a function parameter annotated with
+    attribute ``[[unsafe_buffer_usage]]``,
+      - unless the pointer is a compile time constant ``0`` or ``nullptr``;
+      - a number of C/C++ standard library buffer manipulation functions
+        (such as std::memcpy() or std::next()) are hardcoded to act
+        as if they had the attribute.
+
+The warning doesn’t warn on single pointer use in general, such as
+dereference with operations like * and -> or [0]. If such operation causes
+a buffer overflow, there’s probably another unsafe operation nearby,
+that the warning does warn about. Pointer-based data structures
+such as linked lists or trees are allowed as they don't typically cause
+buffer overflows. “Temporal” safety issues that arise from using raw pointers,
+such use-after-free, null pointer dereference, dangling pointers,
+reference invalidation, are out of scope for this warning.
+
+The warning also doesn’t warn every time a pointer is passed into a function,
+but only when the function is annotated with the attribute. The attribute gets
+auto-attached by automatic fixits, which let the warning and the fixes propagate
+across function boundaries, and the users are encouraged to annotate their
+unsafe functions manually, but the warning is not otherwise inter-procedural.
+
+
+The Pragmas
+===========
+
+In order to aid incremental adoption of the programming model, you are
+encouraged to enable/disable the warning on a file-by-file basis. Additionally,
+pragmas are provided to annotate sections of the code as opt-in to the model:
+
+.. code-block:: c++
+
+  #pragma only_safe_buffers begin
+    ...
+  #pragma only_safe_buffers end
+
+or opt-out of the model:
+
+.. code-block:: c++
+
+  #pragma unsafe_buffer_usage end
+    ...
+  #pragma unsafe_buffer_usage begin
+
+Such pragmas not only enable incremental adoption with much smaller granularity,
+but also provide essential “escape hatches” when the programming model
+is undesirable for a section of code (such as tight loop in performance-critical
+code, or implementation of a custom container). In such cases sections of code
+with unsafe buffer usage deserve being explicitly marked and easily auditable
+by security researches.
+
+Even though similar functionality can be achieved with generic
+``#pragma clang diagnostic``, a custom pragma is preferable because
+the generic solution’s push/pop mechanism may be confusing. In particular,
+it’s not immediately obvious from the code which mode gets enabled after
+a “pop”, which makes it harder to reason about safety both for you and for
+the purposes of security audit. It is also generally valuable for such
+annotation to describe the property of the annotated code,
+rather than an imperative action.
+
+
+The Attribute
+=============
+
+The attribute ``[[unsafe_buffer_usage]]`` should be placed on functions
+that need to be avoided as they may cause buffer overflows. It is designed to
+aid automatic fixits which would replace such unsafe functions with safe
+alternatives, though it can be used even when the fix can’t be automated.
+
+The attribute is warranted even if the only way a function can overflow
+the buffer is by violating the function’s preconditions. For example, it
+would make sense to put the attribute on function ``foo()`` below because
+passing an incorrect size parameter would cause a buffer overflow:
+
+.. code-block:: c++
+
+  [[unsafe_buffer_usage]]
+  void foo(int *buf, size_t size) {
+    for (size_t i = 0; i < size; ++i) {
+      buf[i] = i;
+    }
+  }
+
+The attribute is NOT warranted when the function has runtime protection against
+overflows, assuming hardened libc++, assuming that containers constructed
+outside the function are well-formed. For example, function ``bar()`` below
+doesn’t need an attribute, because assuming buf is well-formed (has size that
+fits the original buffer it refers to), hardened libc++ protects this function
+from overflowing the buffer:
+
+.. code-block:: c++
+
+  void bar(std::span<int> buf) {
+    for (size_t i = 0; i < buf.size(); ++i) {
+      buf[i] = i;
+    }
+  }
+
+This corresponds to our safety precaution about keeping buffers "containerized"
+in spans for as long as possible. Function ``foo()`` may have internal
+consistency, but by accepting a raw buffer it requires the user to unwrap
+the span, which is undesirable.
+
+The attribute is warranted when a function accepts a raw buffer only to
+immediately put it into a span:
+
+.. code-block:: c++
+
+  [[unsafe_buffer_usage]]
+  void baz(int *buf, size_t size) {
+    std::span<int> sp{ buf, size };
+    for (size_t i = 0; i < sp.size(); ++i) {
+      sp[i] = i;
+    }
+  }
+
+In this case ``baz()`` does not contain any unsafe operations, but the awkward
+parameter type causes the caller to unwrap the span unnecessarily.
+In such cases the attribute may never be removed.
+
+In particular, the attribute is NOT an "escape hatch". It does not suppress
+the warnings about unsafe operations in the function. Addressing warnings
+inside the function is still valuable for internal consistency.
+
+Attribute ``[[unsafe_buffer_usage]]`` is similar to attribute [[deprecated]]
+but it has important differences:
+
+* Use of a function annotated by such attribute causes ``-Wunsafe-buffer-usage``
+  warning to appear, instead of ``-Wdeprecated``, so they can be
+  enabled/disabled independently as all four combinations make sense;
+* The “replacement” parameter of ``[[deprecated]]``, which allows for automatic
+  fixits when the function has a drop-in replacement, becomes significantly more
+  powerful and flexible in ``[[unsafe_buffer_usage]]`` where it will allow
+  non-trivial automatic fixes.
+
+(TODO: Explain parameters of the attribute, how they aid automatic fixits)
+
+Code Modernization Workflow With Semi-Automatic Fixits
+======================================================
+
+Every time your code preforms an unsafe operation that causes a
+``-Wunsafe-buffer-usage warning`` to appear, the warning may be accompanied
+by an automatic fix that changes types of one or more variables associated
+with the unsafe operation from raw pointer or array type to safe container type.
+
+For example, the following function contains a local constant-size array.
+
+.. code-block:: c++
+
+  void use_array(int[] x);
+
+  void test_array() {
+    int x[5];
+    x[3] = 3; // Warning: Array indexing is unsafe operation!
+    use_array(x);
+  }
+
+The automatic fixit associated with the warning would transform this array
+into an ``std::array``:
+
+.. code-block:: c++
+
+  void use_array(int[] x);
+
+  void test_array() {
+    std::array<int> x[5];
+    x[3] = 3;
+    use_array(x.data());
+  }
+
+Note that the use site of variable ``x`` inside the call to ``use_array()``
+needed an update. The fixit considers every use site of the fixed variable
+and updates them if necessary.
+
+In some cases an automatic fix would be problematic, so the warning will simply
+highlight unsafe operations for you to consider.
+
+In some cases a partial fix is emitted, where you’ll have to fill in a few
+placeholders in order to document the bounds information related to the pointer.
+This is particularly common when suggesting ``std::span``.
+
+For example, consider function ``foo()`` we've seen before:
+
+.. code-block:: c++
+
+  void foo(int *buf, size_t size) {
+    for (size_t i = 0; i < size; ++i) {
+      buf[i] = i;
+    }
+  }
+
+In spirit of our guidelines, the automatic fixit would prefer this function
+to accept a span instead of raw buffer, so that the span didn't need to be
+unwrapped. Of course, such change alone would break both source compatibility
+and binary compatibility. In order to avoid that, the fix will provide
+a compatibility overload to preserve the old functionality. The updated code
+produced by the fixit will look like this:
+
+.. code-block:: c++
+
+  [[unsafe_buffer_usage]]
+  void foo(int *buf, size_t size) {
+    foo(std::span<int>{ buf, __placeholder__}, size);
+  }
+
+  void foo(std::span<int> buf, size_t size) {
+    for (size_t i = 0; i < size; ++i) {
+      buf[i] = i;
+    }
+  }
+
+The following changes were performed automatically:
+
+  - The type of parameter ``buf`` was changed from ``int *`` to
+    ``std::span<int>``. Use sites updated if necessary.
+  - A compatibility overload was autogenerated with the old prototype, with
+    attribute ``[[unsafe_buffer_usage]]`` attached to it to encourage
+    the callers to switch to the new function – and possibly update the callers
+    automatically!
+
+The following changes need manual intervention:
+
+  - The compatibility overload contains a ``__placeholder__`` which needs
+    to be populated manually. In this case ``size`` is a good candidate.
+  - Despite accepting a ``std::span`` which carries size information,
+    the fixed function still accepts ``size`` separately. It can be removed
+    manually, or it can be preserved, if ``size`` and ``buf.size()``
+    actually need to be different in your case.
+
+Placeholders fulfill an important purpose as they attract attention to
+situations where the buffer's size wasn't properly documented for the purposes
+of bounds checks. Variable ``size`` does not *have* to carry the size of the
+buffer (or the size of *that* buffer) just becaused it's named "size".
+The compiler will avoid making guesses about that.
+
+The fixits emitted by the warning are correct modulo placeholders. Placeholders
+are the only reason why fixed code is allowed to fail to compile.
+Incorrectly resolving the placeholder is the only reason why fixed code
+will demonstrate incorrect runtime behavior compared to the original code.
+In an otherwise well-formed program it is always possible (and usually easy)
+to resolve the placeholder correctly.
+
+Note that regardless of how ``__placeholder__`` is resolved, it does not allow
+you to remove the ``[[unsafe_buffer_usage]]`` annotation. The annotation will
+stay forever because that function is now equivalent to function ``baz()`` we've
+seen before: it contains no unsafe operations, but it only offers internal
+consistency. It is still possible to misuse that function by supplying an
+invalid ``size`` parameter. It still requires you to unwrap ``std::span`` if you
+already have it, only to wrap it back immediately. So the callers should still
+be updated to use the new function, and automatic fixits will now be emitted
+for the call sites to aid that.
+
+Even if you decide to remove the ``size`` parameter, fixits at call sites
+will remain operational. A warning will be emitted if the replacement function's
+prototype diverges from the original prototype beyond recognition. In such cases
+an attribute can be either updated to give more manual hints to the compiler, or
+changed to a different variant that explicitly opts out of automatic fixits.
+
+(TODO: Elaborate on that last point and confirm that we actually want
+such behavior.)
+
+(TODO: Cover more examples.)
+
+The Hardened libc++
+===================
+
+(TODO: Write this section. Probably just link to its own documentation.)
+
+Clang Static Analyzer Checks
+============================
+
+(TODO: Write this section.)
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to