On Thu, Apr 30, 2026 at 01:02:39AM +0400, Marc-André Lureau wrote: > The text console receives bytes that may be UTF-8 encoded (e.g. from > a guest running a modern distro), but currently treats each byte as a > raw character index into the VGA/CP437 font, producing garbled output > for any multi-byte sequence. > > Add a UTF-8 decoder using Bjoern Hoehrmann's DFA. The DFA inherently > rejects overlong encodings, surrogates, and codepoints above U+10FFFF. > Completed codepoints are then mapped to CP437, unmappable characters are > displayed as '?'. > > Note that QEMU has a "buffered" utf8 decoder in util/unicode.c, but > it is not a good fit for byte-per-byte decoding. > > Signed-off-by: Marc-André Lureau <[email protected]> > --- > ui/cp437.h | 13 ++++ > ui/console-vc.c | 59 ++++++++++++++++ > ui/cp437.c | 205 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ui/meson.build | 2 +- > 4 files changed, 278 insertions(+), 1 deletion(-)
Reviewed-by: Daniel P. Berrangé <[email protected]> With regards, Daniel -- |: https://berrange.com ~~ https://hachyderm.io/@berrange :| |: https://libvirt.org ~~ https://entangle-photo.org :| |: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
