On Thu, Jul 17, 2025 at 7:03 PM Jonathan Wakely <jwak...@redhat.com> wrote:
> This reorders the data members of _Utf_iterator to avoid padding bytes > between members due to alignment requirements. For x86_64 the previous > layout had padding after _M_buf and after _M_to_increment for the common > case where the iterators and sentinel types are pointers, so the size > shrinks from 40 bytes to 32 bytes. (For i686 there's no change, it's > still 20 bytes). > > We could compress the three uint8_t members into one byte by using > bit-fields: > > uint8_t _M_buf_index : 2; // [0,3] > uint8_t _M_buf_last : 3; // [0,4] > uint8_t _M_to_increment : 3; // [0,4] > > But there doesn't seem to be any point, because it will just be slower > to access them and there will be tail padding so the size isn't any > smaller. We could also reduce _M_buf_last and _M_to_increment to 2 bits > because the 0 value is only used for a default constructed iterator, and > we don't actually care about the values in that case. Again, this > doesn't seem worth doing. > > libstdc++-v3/ChangeLog: > > * include/bits/unicode.h (_Utf_iterator): Reorder data members > to be more compact. > --- > > Tested x86_64-linux. > LGTM. > > libstdc++-v3/include/bits/unicode.h | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/libstdc++-v3/include/bits/unicode.h > b/libstdc++-v3/include/bits/unicode.h > index 83a9b0a708f5..9a38462e8102 100644 > --- a/libstdc++-v3/include/bits/unicode.h > +++ b/libstdc++-v3/include/bits/unicode.h > @@ -509,9 +509,6 @@ namespace __unicode > constexpr _Iter > _M_curr() const { return _M_first_and_curr._M_curr; } > > - // Buffer holding the individual code units of the current code > point. > - array<value_type, 4 / sizeof(_ToFmt)> _M_buf; > - > // _M_first is not needed for non-bidirectional ranges. > template<typename _It> > struct _First_and_curr > @@ -553,13 +550,16 @@ namespace __unicode > // start (or end, for non-forward iterators) of the current code > point. > _First_and_curr<_Iter> _M_first_and_curr; > > + // The end of the underlying input range. > + [[no_unique_address]] _Sent _M_last; > + > + // Buffer holding the individual code units of the current code > point. > + array<value_type, 4 / sizeof(_ToFmt)> _M_buf; > + > uint8_t _M_buf_index = 0; // Index of current code unit in the > buffer. > uint8_t _M_buf_last = 0; // Number of code units in the buffer. > uint8_t _M_to_increment = 0; // How far to advance _M_curr on > increment. > > - // The end of the underlying input range. > - [[no_unique_address]] _Sent _M_last; > - > template<typename _FromFmt2, typename _ToFmt2, > input_iterator _Iter2, sentinel_for<_Iter2> _Sent2, > typename _ErrHandler> > -- > 2.50.1 > >