NoQ created this revision. NoQ added reviewers: aaron.ballman, gribozavr2, xazax.hun, ldionne, jkorous, t-rasmud, ziqingluo-90, malavikasamak. Herald added subscribers: steakhal, martong, rnkovacs. Herald added a project: All. NoQ requested review of this revision.
This document accompanies the LLVM Discourse discussion in https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734 as @aaron.ballman recommended writing it up early. It's intended to eventually become the user-facing documentation for the tool. As of now the document roughly represents our current plan; it may be corrected in the future, which may or may not be a result of our discussion here! And there are a few TODOs left for me to fill in. Of course, separate documentation will need to be provided for individual items, such as attribute documentation, but that's a completely different story. Repository: rC Clang https://reviews.llvm.org/D136811 Files: clang/docs/SafeBuffers.rst
Index: clang/docs/SafeBuffers.rst =================================================================== --- /dev/null +++ clang/docs/SafeBuffers.rst @@ -0,0 +1,401 @@ +================ +C++ Safe Buffers +================ + +.. contents:: + :local: + + +Introduction +============ + +Clang can be used to harden your C++ code against buffer overflows, an otherwise +common security issue with C-based languages. + +The solution described in this document is an integrated programming model: +it defines safety guidelines that restrict operations the code is allowed +to perform, it provides tools for updating the code to conform to these +guidelines, and it provides runtime mitigations for the situations when +following guidelines isnât sufficient. + +Namely, the solution consists of the following parts: + + - ``-Wunsafe-buffer-usage`` is a family of warnings that warns you when + a potentially unsafe operation is performed in your code; + - Pragmas ``unsafe_buffer_usage`` and ``only_safe_buffers`` and allow you to + annotate sections of code as either already-hardened or not-to-be-hardened, + enabling incremental adoption and and providing âescape hatchesâ + when unsafety is necessary; + - Automatic fixits provided by the warning act as a modernizer to help you + convert large amounts of old code to conform to the warning; + - Attribute ``[[unsafe_buffer_usage]]`` lets you annotate custom functions as + unsafe, while providing a safe alternative that can often be suggested by + the compiler automatically. + - LLVMâs own C++ standard library implementation, libc++, provides a + hardened mode where C++ classes such as ``std::vector`` and ``std::span``, + together with their respective ``iterator`` classes, are protected + at runtime against buffer overflows. This changes buffer overflows from + extremely dangerous undefined behavior to predictable runtime crash. + - Finally, in order to avoid bugs in newly converted code, the + Clang static analyzer provides a checker to find misconstructed + ``std::span`` objects. + +Note that some of these partial solutions are useful on their own. For example, +hardened libc++ can be used as a âsanitizerâ to find buffer overflow bugs in +existing code. The static analyzer checker is also universally useful, +not tied to this programming model. The warning may be desired independently, +even in plain C code, if youâre hard-bent on isolating, annotating and auditing +all buffer-related code in your codebase. That said, some of these guidelines +donât cause your code to mitigate vulnerabilities better unless hardened +containers are in use to carry out runtime checks. + + +Why Not Just Harden All Pointer Operations? +=========================================== + +At a glance it seems reasonable to ask the compiler to make raw pointer +operations crash on out-of-bounds accesses at runtime. And in case of C, this +may be the only way to achieve safety. The main problem with such solution lies +in dealing with the fact that most pointers in the code donât have bounds +information associated with them. Adding such information to the code +in order to allow the compiler to perform bounds checks at runtime would result +either in a system of attributes significantly more advanced than +the current solution, or a significant change in runtime representation +of all pointers. + +Fortunately, C++ already has a standard solution for passing bounds information +alongside the pointer, the ``std::span``. With the help of safe standard +containers such as ``std::span``, you can achieve bounds safety by +*simply writing good modern C++ code*. This is what this solution is designed +to help your codebase with. + + +The Guidelines +============== + +Letâs start with how your code would need to look like to comply with +the programming model. + +1. When you allocate a buffer, allocate a container instead. For example, + if you find yourself allocating a fixed-size array, use ``std::array``. + If you need to allocate a heap buffer of variable length, ``std::vector`` + can often act as a drop-in replacement. A vector can do a lot more than that, + but you can always resort to a fill-constructor that preallocates the buffer + of necessary size, and then never use any resizing operations. This gives you + a simple safe buffer at no extra cost other than the cost of safety. Another + good solution is ``std::span`` which you can initialize with a heap pointer + allocated with ``new[]``; it doesnât assume unique ownership so you can copy + such span if you want, just like a raw pointer, but of course youâll have to + manually ``delete[]`` it when appropriate. + (TODO: What about things like ``std::shared_ptr<T []>``)? +2. When you deal with function or class interfaces that accept or return + buffers, either pass the original container â move it or pass by reference or + use a smart pointer â or pass a view into that container, such as + ``std::span``, which helps you deal with different container classes + uniformly. +3. If you deal with interfaces you cannot change, such as an interface of + a third-party library youâre using, and these interfaces require you to pass + buffers as raw pointers, make sure you âcontainerizeâ these buffers + *as soon as possible*. For example, if the library returns a raw buffer + pointer, put it into ``std::span`` in order to immediately write down + the size of that buffer for the purposes of future bounds checks. + Then keep such buffers contained this way *for as long as possible*, + especially when passing across function boundaries. Say, until you need to + pass it back to that library, you can pass ``std::span`` by value between + your functions to preserve precise size information naturally. +4. Sometimes you will find yourself implementing a custom container. + Say, ``std::vector`` or ``std::span`` may turn out to be poorly suitable + for your needs, and this is fine. In such cases the same guidelines apply. + You may have to use the opt-out pragma on the implementation of + the container â thatâs exactly their purpose! Additionally, consider + implementing runtime checks in your container similar to the ones already + present in hardened libc++, because following the guidelines alone is often + insufficient without such hardening. + (TODO: Will automatic fixits be able to suggest custom containers or views?) + (TODO: Explain how to implement such checks in a custom container?) + + +The Warning +=========== + +The warning ``-Wunsafe-buffer-usage`` warns on the following operations: + + - Array subscript expression on raw arrays or raw pointers, + - unless the index is a compile-time constant ``0``, + - Increment and decrement of a raw pointer with operators ``++`` and ``--``; + - Addition or subtraction of a number to/from a raw pointer with operators + ``+``, ``-``, ``+=``, ``-=``, + - unless that number is a compile time constant ``0``; + - subtraction between two pointers is also fine; + - Passing a pointer through a function parameter annotated with + attribute ``[[unsafe_buffer_usage]]``, + - unless the pointer is a compile time constant ``0`` or ``nullptr``; + - a number of C/C++ standard library buffer manipulation functions + (such as std::memcpy() or std::next()) are hardcoded to act + as if they had the attribute. + +The warning doesnât warn on single pointer use in general, such as +dereference with operations like * and -> or [0]. If such operation causes +a buffer overflow, thereâs probably another unsafe operation nearby, +that the warning does warn about. Pointer-based data structures +such as linked lists or trees are allowed as they don't typically cause +buffer overflows. âTemporalâ safety issues that arise from using raw pointers, +such use-after-free, null pointer dereference, dangling pointers, +reference invalidation, are out of scope for this warning. + +The warning also doesnât warn every time a pointer is passed into a function, +but only when the function is annotated with the attribute. The attribute gets +auto-attached by automatic fixits, which let the warning and the fixes propagate +across function boundaries, and the users are encouraged to annotate their +unsafe functions manually, but the warning is not otherwise inter-procedural. + + +The Pragmas +=========== + +In order to aid incremental adoption of the programming model, you are +encouraged to enable/disable the warning on a file-by-file basis. Additionally, +pragmas are provided to annotate sections of the code as opt-in to the model: + +.. code-block:: c++ + + #pragma only_safe_buffers begin + ... + #pragma only_safe_buffers end + +or opt-out of the model: + +.. code-block:: c++ + + #pragma unsafe_buffer_usage end + ... + #pragma unsafe_buffer_usage begin + +Such pragmas not only enable incremental adoption with much smaller granularity, +but also provide essential âescape hatchesâ when the programming model +is undesirable for a section of code (such as tight loop in performance-critical +code, or implementation of a custom container). In such cases sections of code +with unsafe buffer usage deserve being explicitly marked and easily auditable +by security researches. + +Even though similar functionality can be achieved with generic +``#pragma clang diagnostic``, a custom pragma is preferable because +the generic solutionâs push/pop mechanism may be confusing. In particular, +itâs not immediately obvious from the code which mode gets enabled after +a âpopâ, which makes it harder to reason about safety both for you and for +the purposes of security audit. It is also generally valuable for such +annotation to describe the property of the annotated code, +rather than an imperative action. + + +The Attribute +============= + +The attribute ``[[unsafe_buffer_usage]]`` should be placed on functions +that need to be avoided as they may cause buffer overflows. It is designed to +aid automatic fixits which would replace such unsafe functions with safe +alternatives, though it can be used even when the fix canât be automated. + +The attribute is warranted even if the only way a function can overflow +the buffer is by violating the functionâs preconditions. For example, it +would make sense to put the attribute on function ``foo()`` below because +passing an incorrect size parameter would cause a buffer overflow: + +.. code-block:: c++ + + [[unsafe_buffer_usage]] + void foo(int *buf, size_t size) { + for (size_t i = 0; i < size; ++i) { + buf[i] = i; + } + } + +The attribute is NOT warranted when the function has runtime protection against +overflows, assuming hardened libc++, assuming that containers constructed +outside the function are well-formed. For example, function ``bar()`` below +doesnât need an attribute, because assuming buf is well-formed (has size that +fits the original buffer it refers to), hardened libc++ protects this function +from overflowing the buffer: + +.. code-block:: c++ + + void bar(std::span<int> buf) { + for (size_t i = 0; i < buf.size(); ++i) { + buf[i] = i; + } + } + +This corresponds to our safety precaution about keeping buffers "containerized" +in spans for as long as possible. Function ``foo()`` may have internal +consistency, but by accepting a raw buffer it requires the user to unwrap +the span, which is undesirable. + +The attribute is warranted when a function accepts a raw buffer only to +immediately put it into a span: + +.. code-block:: c++ + + [[unsafe_buffer_usage]] + void baz(int *buf, size_t size) { + std::span<int> sp{ buf, size }; + for (size_t i = 0; i < sp.size(); ++i) { + sp[i] = i; + } + } + +In this case ``baz()`` does not contain any unsafe operations, but the awkward +parameter type causes the caller to unwrap the span unnecessarily. +In such cases the attribute may never be removed. + +In particular, the attribute is NOT an "escape hatch". It does not suppress +the warnings about unsafe operations in the function. Addressing warnings +inside the function is still valuable for internal consistency. + +Attribute ``[[unsafe_buffer_usage]]`` is similar to attribute [[deprecated]] +but it has important differences: + +* Use of a function annotated by such attribute causes ``-Wunsafe-buffer-usage`` + warning to appear, instead of ``-Wdeprecated``, so they can be + enabled/disabled independently as all four combinations make sense; +* The âreplacementâ parameter of ``[[deprecated]]``, which allows for automatic + fixits when the function has a drop-in replacement, becomes significantly more + powerful and flexible in ``[[unsafe_buffer_usage]]`` where it will allow + non-trivial automatic fixes. + +(TODO: Explain parameters of the attribute, how they aid automatic fixits) + +Code Modernization Workflow With Semi-Automatic Fixits +====================================================== + +Every time your code preforms an unsafe operation that causes a +``-Wunsafe-buffer-usage warning`` to appear, the warning may be accompanied +by an automatic fix that changes types of one or more variables associated +with the unsafe operation from raw pointer or array type to safe container type. + +For example, the following function contains a local constant-size array. + +.. code-block:: c++ + + void use_array(int[] x); + + void test_array() { + int x[5]; + x[3] = 3; // Warning: Array indexing is unsafe operation! + use_array(x); + } + +The automatic fixit associated with the warning would transform this array +into an ``std::array``: + +.. code-block:: c++ + + void use_array(int[] x); + + void test_array() { + std::array<int> x[5]; + x[3] = 3; + use_array(x.data()); + } + +Note that the use site of variable ``x`` inside the call to ``use_array()`` +needed an update. The fixit considers every use site of the fixed variable +and updates them if necessary. + +In some cases an automatic fix would be problematic, so the warning will simply +highlight unsafe operations for you to consider. + +In some cases a partial fix is emitted, where youâll have to fill in a few +placeholders in order to document the bounds information related to the pointer. +This is particularly common when suggesting ``std::span``. + +For example, consider function ``foo()`` we've seen before: + +.. code-block:: c++ + + void foo(int *buf, size_t size) { + for (size_t i = 0; i < size; ++i) { + buf[i] = i; + } + } + +In spirit of our guidelines, the automatic fixit would prefer this function +to accept a span instead of raw buffer, so that the span didn't need to be +unwrapped. Of course, such change alone would break both source compatibility +and binary compatibility. In order to avoid that, the fix will provide +a compatibility overload to preserve the old functionality. The updated code +produced by the fixit will look like this: + +.. code-block:: c++ + + [[unsafe_buffer_usage]] + void foo(int *buf, size_t size) { + foo(std::span<int>{ buf, __placeholder__}, size); + } + + void foo(std::span<int> buf, size_t size) { + for (size_t i = 0; i < size; ++i) { + buf[i] = i; + } + } + +The following changes were performed automatically: + + - The type of parameter ``buf`` was changed from ``int *`` to + ``std::span<int>``. Use sites updated if necessary. + - A compatibility overload was autogenerated with the old prototype, with + attribute ``[[unsafe_buffer_usage]]`` attached to it to encourage + the callers to switch to the new function â and possibly update the callers + automatically! + +The following changes need manual intervention: + + - The compatibility overload contains a ``__placeholder__`` which needs + to be populated manually. In this case ``size`` is a good candidate. + - Despite accepting a ``std::span`` which carries size information, + the fixed function still accepts ``size`` separately. It can be removed + manually, or it can be preserved, if ``size`` and ``buf.size()`` + actually need to be different in your case. + +Placeholders fulfill an important purpose as they attract attention to +situations where the buffer's size wasn't properly documented for the purposes +of bounds checks. Variable ``size`` does not *have* to carry the size of the +buffer (or the size of *that* buffer) just becaused it's named "size". +The compiler will avoid making guesses about that. + +The fixits emitted by the warning are correct modulo placeholders. Placeholders +are the only reason why fixed code is allowed to fail to compile. +Incorrectly resolving the placeholder is the only reason why fixed code +will demonstrate incorrect runtime behavior compared to the original code. +In an otherwise well-formed program it is always possible (and usually easy) +to resolve the placeholder correctly. + +Note that regardless of how ``__placeholder__`` is resolved, it does not allow +you to remove the ``[[unsafe_buffer_usage]]`` annotation. The annotation will +stay forever because that function is now equivalent to function ``baz()`` we've +seen before: it contains no unsafe operations, but it only offers internal +consistency. It is still possible to misuse that function by supplying an +invalid ``size`` parameter. It still requires you to unwrap ``std::span`` if you +already have it, only to wrap it back immediately. So the callers should still +be updated to use the new function, and automatic fixits will now be emitted +for the call sites to aid that. + +Even if you decide to remove the ``size`` parameter, fixits at call sites +will remain operational. A warning will be emitted if the replacement function's +prototype diverges from the original prototype beyond recognition. In such cases +an attribute can be either updated to give more manual hints to the compiler, or +changed to a different variant that explicitly opts out of automatic fixits. + +(TODO: Elaborate on that last point and confirm that we actually want +such behavior.) + +(TODO: Cover more examples.) + +The Hardened libc++ +=================== + +(TODO: Write this section. Probably just link to its own documentation.) + +Clang Static Analyzer Checks +============================ + +(TODO: Write this section.)
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits