On Sun, 1 Feb 2026 at 20:39, Patrick Palka <[email protected]> wrote:
>
> On Sun, 1 Feb 2026, Patrick Palka wrote:
>
> > On Sat, 31 Jan 2026, Jonathan Wakely wrote:
> >
> > > On Fri, 30 Jan 2026 at 15:25, Patrick Palka <[email protected]> wrote:
> > > >
> > > > On Thu, 29 Jan 2026, Jonathan Wakely wrote:
> > > > >
> > > > > I wondered about replacing _M_bytes[7] with a union like:
> > > > >
> > > > >   union {
> > > > >     unsigned char _M_bytes[7];
> > > > >     struct { // Used by restore_rep_count frame
> > > > >       unsigned char _M_rep_count : 2;
> > > > >     };
> > > > >     struct { // Used by restore_cur_results frame
> > > > >       unsigned char _M_matched : 1;
> > > > >       unsigned char _M_restore_first : 1;
> > > > >     };
> > > > >   };
> > > > >
> > > > > So that we use more precise names than just _M_bytes[0] (and could
> > > > > still extend it later with other alternatives if needed).
> > > > >
> > > > > This might be over-complicating things, but if we *did* want to do it,
> > > > > I think we should decide now because the difference between setting
> > > > > _M_bytes[0] = 0xff and setting _M_restore_first = true would not be
> > > > > compatible if we changed it later (old TUs might expect to find 0xff
> > > > > there and new TUs would only set one bit).
> > > >
> > > > I think slightly more flexible would be:
> > > >
> > > >    union {
> > > >      unsigned char _M_byte0;
> > > >      struct { // Used by restore_rep_count frame
> > > >        unsigned char _M_rep_count : 2;
> > > >      };
> > > >      struct { // Used by restore_cur_results frame
> > > >        unsigned char _M_matched : 1;
> > > >        unsigned char _M_restore_first : 1;
> > > >      };
> > > >    };
> > > >    unsigned char _M_bytes[6];
> > > >
> > > > So that we could decide how to use each byte independently of the other
> > > > (e.g. maybe one byte could be for universal flags that apply to every
> > > > code).
> > >
> > > Do we want to value-init the _M_byte0 member in the constructor, so
> > > that we know all the bit-fields are initially zero?
> > >
> > > I think we can use a NSDMI for union members even in C++11, so this 
> > > should work:
> > >
> > > unsigned char _M_byte0 = 0;
> >
> > Zero-initializing the flags makes sense but I think we have to manually
> > init each bit-field (according to the state id) so that the appropriate
> > union member gets activated?  Or can we pretend union type punning is
> > well defined (in non-constexpr code)?
>
> Since the code uses anonymous structs which is already an extension I
> reckon we can assume type punning between them is well-defined.

Yes, I think that's fine to assume.

My main concern was that we might forget to initialize one of the
bit-fields (_M_count, _M_end, and _M_matched) in the handler that set
them, and then when we read them back we read garbage. Looking more
carefully, it looks like they're always set by your patch, but zeroing
_M_byte0 on construction seems safer.

Tangentially, would _M_is_end or _M_expr_end be a better name for the
_M_end bit-field? We use _M_end for the [_M_begin,_M_end) iterator
pair, so when I was searching for all uses of the _M_end bit-field I
kept hitting those instead.

Reply via email to