On Mon, Apr 20, 2026 at 11:54:40AM +0400, Marc-André Lureau wrote: > Hi > > On Wed, Apr 15, 2026 at 3:24 PM Daniel P. Berrangé <[email protected]> > wrote: > > > > On Fri, Apr 10, 2026 at 11:18:31PM +0400, Marc-André Lureau wrote: > > > The text console receives bytes that may be UTF-8 encoded (e.g. from > > > a guest running a modern distro), but currently treats each byte as a > > > raw character index into the VGA/CP437 font, producing garbled output > > > for any multi-byte sequence. > > > > Presumably the key words here are "may be" .... as in, it > > also "may NOT be" UTF-8. > > > > IIUC, the current code is assuming that all data from the guest > > is in the CP437 encoding (8-bit Extended ASCII), and that encoding > > has valid characters for all 256 code points. > > > > By adding UTF-8 decoding for val > 0x80 this is breaking compat > > with any guest that is outputting data with the full range of > > CP437. > > > > IOW, this patch is moving the brokeness from guests which > > use UTF8, onto guests which use CP437. > > > > Only guests which strictly limit themselves to 7-bit ASCII > > are unaffected. > > > > I accept the UTF8 should probably be considered the common > > case for modern guests, but this hardcoding a different type > > of breakage feels undesirable to me. > > > > Surely we need an explicit config property here to select > > the between character sets we expect from the guest ? > > Probably, I don't know if many guest/apps rely on the serial encoding, > but we should probably be conservative. > > Adjusting the decoding at runtime may be possible, but it could be tricky. > > Instead, we could add a vc chardev option like charset=cp437/utf8. > > What should be the default? For compatibility reasons, use CP437 for > pc machine <=11.0 and default to utf8 for others? wdyt?
Strictly speaking this is not guest ABI, just a change in defaults for the backend. So as long as we provide the config option, we could potentially just change the default to UTF8 unconditionally, on the basis that UTF8 has been the default charset in mainstream Linux for 20 years. Not sure what Windows uses by default, but use of the serial console with Linux guests is much more likely than Windows guests IMHO.
