On Sun, 1 Feb 2026 at 20:39, Patrick Palka <[email protected]> wrote: > > On Sun, 1 Feb 2026, Patrick Palka wrote: > > > On Sat, 31 Jan 2026, Jonathan Wakely wrote: > > > > > On Fri, 30 Jan 2026 at 15:25, Patrick Palka <[email protected]> wrote: > > > > > > > > On Thu, 29 Jan 2026, Jonathan Wakely wrote: > > > > > > > > > > I wondered about replacing _M_bytes[7] with a union like: > > > > > > > > > > union { > > > > > unsigned char _M_bytes[7]; > > > > > struct { // Used by restore_rep_count frame > > > > > unsigned char _M_rep_count : 2; > > > > > }; > > > > > struct { // Used by restore_cur_results frame > > > > > unsigned char _M_matched : 1; > > > > > unsigned char _M_restore_first : 1; > > > > > }; > > > > > }; > > > > > > > > > > So that we use more precise names than just _M_bytes[0] (and could > > > > > still extend it later with other alternatives if needed). > > > > > > > > > > This might be over-complicating things, but if we *did* want to do it, > > > > > I think we should decide now because the difference between setting > > > > > _M_bytes[0] = 0xff and setting _M_restore_first = true would not be > > > > > compatible if we changed it later (old TUs might expect to find 0xff > > > > > there and new TUs would only set one bit). > > > > > > > > I think slightly more flexible would be: > > > > > > > > union { > > > > unsigned char _M_byte0; > > > > struct { // Used by restore_rep_count frame > > > > unsigned char _M_rep_count : 2; > > > > }; > > > > struct { // Used by restore_cur_results frame > > > > unsigned char _M_matched : 1; > > > > unsigned char _M_restore_first : 1; > > > > }; > > > > }; > > > > unsigned char _M_bytes[6]; > > > > > > > > So that we could decide how to use each byte independently of the other > > > > (e.g. maybe one byte could be for universal flags that apply to every > > > > code). > > > > > > Do we want to value-init the _M_byte0 member in the constructor, so > > > that we know all the bit-fields are initially zero? > > > > > > I think we can use a NSDMI for union members even in C++11, so this > > > should work: > > > > > > unsigned char _M_byte0 = 0; > > > > Zero-initializing the flags makes sense but I think we have to manually > > init each bit-field (according to the state id) so that the appropriate > > union member gets activated? Or can we pretend union type punning is > > well defined (in non-constexpr code)? > > Since the code uses anonymous structs which is already an extension I > reckon we can assume type punning between them is well-defined.
Yes, I think that's fine to assume. My main concern was that we might forget to initialize one of the bit-fields (_M_count, _M_end, and _M_matched) in the handler that set them, and then when we read them back we read garbage. Looking more carefully, it looks like they're always set by your patch, but zeroing _M_byte0 on construction seems safer. Tangentially, would _M_is_end or _M_expr_end be a better name for the _M_end bit-field? We use _M_end for the [_M_begin,_M_end) iterator pair, so when I was searching for all uses of the _M_end bit-field I kept hitting those instead.
