On Thu, Jul 17, 2025 at 7:03 PM Jonathan Wakely <jwak...@redhat.com> wrote:

> This reorders the data members of _Utf_iterator to avoid padding bytes
> between members due to alignment requirements. For x86_64 the previous
> layout had padding after _M_buf and after _M_to_increment for the common
> case where the iterators and sentinel types are pointers, so the size
> shrinks from 40 bytes to 32 bytes.  (For i686 there's no change, it's
> still 20 bytes).
>
> We could compress the three uint8_t members into one byte by using
> bit-fields:
>
> uint8_t _M_buf_index : 2;    // [0,3]
> uint8_t _M_buf_last  : 3;    // [0,4]
> uint8_t _M_to_increment : 3; // [0,4]
>
> But there doesn't seem to be any point, because it will just be slower
> to access them and there will be tail padding so the size isn't any
> smaller. We could also reduce _M_buf_last and _M_to_increment to 2 bits
> because the 0 value is only used for a default constructed iterator, and
> we don't actually care about the values in that case. Again, this
> doesn't seem worth doing.
>
> libstdc++-v3/ChangeLog:
>
>         * include/bits/unicode.h (_Utf_iterator): Reorder data members
>         to be more compact.
> ---
>
> Tested x86_64-linux.
>
LGTM.

>
>  libstdc++-v3/include/bits/unicode.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/unicode.h
> b/libstdc++-v3/include/bits/unicode.h
> index 83a9b0a708f5..9a38462e8102 100644
> --- a/libstdc++-v3/include/bits/unicode.h
> +++ b/libstdc++-v3/include/bits/unicode.h
> @@ -509,9 +509,6 @@ namespace __unicode
>        constexpr _Iter
>        _M_curr() const { return _M_first_and_curr._M_curr; }
>
> -      // Buffer holding the individual code units of the current code
> point.
> -      array<value_type, 4 / sizeof(_ToFmt)> _M_buf;
> -
>        // _M_first is not needed for non-bidirectional ranges.
>        template<typename _It>
>         struct _First_and_curr
> @@ -553,13 +550,16 @@ namespace __unicode
>        // start (or end, for non-forward iterators) of the current code
> point.
>        _First_and_curr<_Iter> _M_first_and_curr;
>
> +      // The end of the underlying input range.
> +      [[no_unique_address]] _Sent _M_last;
> +
> +      // Buffer holding the individual code units of the current code
> point.
> +      array<value_type, 4 / sizeof(_ToFmt)> _M_buf;
> +
>        uint8_t _M_buf_index = 0;    // Index of current code unit in the
> buffer.
>        uint8_t _M_buf_last = 0;     // Number of code units in the buffer.
>        uint8_t _M_to_increment = 0; // How far to advance _M_curr on
> increment.
>
> -      // The end of the underlying input range.
> -      [[no_unique_address]] _Sent _M_last;
> -
>        template<typename _FromFmt2, typename _ToFmt2,
>                input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
>                typename _ErrHandler>
> --
> 2.50.1
>
>

Reply via email to