https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110807

--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Torbjorn Svensson
<az...@gcc.gnu.org>:

https://gcc.gnu.org/g:5d684a5f7f82b1425cac5eaa11dccc3b51a62d44

commit r13-8309-g5d684a5f7f82b1425cac5eaa11dccc3b51a62d44
Author: Alexandre Oliva <ol...@adacore.com>
Date:   Thu Nov 9 00:01:37 2023 -0300

    libstdc++: optimize bit iterators assuming normalization [PR110807]

    The representation of bit iterators, using a pointer into an array of
    words, and an unsigned bit offset into that word, makes for some
    optimization challenges: because the compiler doesn't know that the
    offset is always in a certain narrow range, beginning at zero and
    ending before the word bitwidth, when a function loads an offset that
    it hasn't normalized itself, it may fail to derive certain reasonable
    conclusions, even to the point of retaining useless calls that elicit
    incorrect warnings.

    Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit
    list to a global bit vector variable.  Based on the compile-time
    constant length of the list, we decide in _M_insert_range whether to
    use the existing storage or to allocate new storage for the vector.
    After allocation, we decide in _M_copy_aligned how to copy any
    preexisting portions of the vector to the newly-allocated storage.
    When copying two or more words, we use __builtin_memmove.

    However, because we compute the available room using bit offsets
    without range information, even comparing them with constants, we fail
    to infer ranges for the preexisting vector depending on word size, and
    may thus retain the memmove call despite knowing we've only allocated
    one word.

    Other parts of the compiler then detect the mismatch between the
    constant allocation size and the much larger range that could
    theoretically be copied into the newly-allocated storage if we could
    reach the call.

    Ensuring the compiler is aware of the constraints on the offset range
    enables it to do a much better job at optimizing.  Using attribute
    assume (_M_offset <= ...) didn't work, because gimple lowered that to
    something that vrp could only use to ensure 'this' was non-NULL.
    Exposing _M_offset as an automatic variable/gimple register outside
    the unevaluated assume operand enabled the optimizer to do its job.

    Rather than placing such load-then-assume constructs all over, I
    introduced an always-inline member function in bit iterators that does
    the job of conveying to the compiler the information that the
    assumption is supposed to hold, and various calls throughout functions
    pertaining to bit iterators that might not otherwise know that the
    offsets have to be in range, so that the compiler no longer needs to
    make conservative assumptions that prevent optimizations.

    With the explicit assumptions, the compiler can correlate the test for
    available storage in the vector with the test for how much storage
    might need to be copied, and determine that, if we're not asking for
    enough room for two or more words, we can omit entirely the code to
    copy two or more words, without any runtime overhead whatsoever: no
    traces remain of the undefined behavior or of the tests that inform
    the compiler about the assumptions that must hold.

    for  libstdc++-v3/ChangeLog

            PR libstdc++/110807
            * include/bits/stl_bvector.h (_Bit_iterator_base): Add
            _M_assume_normalized member function.  Call it in _M_bump_up,
            _M_bump_down, _M_incr, operator==, operator<=>, operator<, and
            operator-.
            (_Bit_iterator): Also call it in operator*.
            (_Bit_const_iterator): Likewise.

    (cherry picked from commit e39b3e02c27bd771a07e385f9672ecf1a45ced77)

Reply via email to