Paul Eggert wrote:
> > Based on the comments in gnulib/lib/mbrtoc16.c, I think it should better
> > clear the first 24, not 12, bytes of the struct. Otherwise it can be in
> > a state where mbsinit() returns true but the mbrto* functions have
> > undefined behaviour.
>
> For mbcel all all that matters is mbrtoc32. Could you give an example of
> the undefined behavior there? I looked at the citrus implementations in
> current FreeBSD, OpenBSD and macOS and thought that 12 bytes is enough
> for mbrtoc32 on all their porting targets. NetBSD is a bit different and
> needs just a pointer width.
There's a difference between the part that mbsinit() looks at and the part
that needs to be zeroed, to avoid undefined behaviour. For example, if we
have a
typedef struct
{
unsigned int count;
unsigned int wchar;
}
mbstate_t;
mbsinit() can return true if state->count == 0. But that does not mean
that every state with state->count == 0 is valid. It is perfectly OK
for mbrtowc() or mbrtoc32() (or other functions) to call abort() or to
crash if
state->count == 0 && state->wchar != 0.
By reading the source code of FreeBSD, NetBSD, OpenBSD, macOS, Solaris,
and so on, I can easily determine
- which parts of the mbstate_t mbsinit() tests,
- which parts of the mbstate_t the various functions use.
But in order to understand what interdependencies there are, between
the various mbstate_t fields, and what are the assumed invariants,
I would need to carefully read each of the mentioned files (one per
OS and per locale type). And this would not be future-proof: After
changes in the bulk of code, the interdependencies and assumed invariants
might not be the same any more. If we then have cleared too few fields
of the mbstate_t, things might crash.
Bruno