On Thu, Apr 30, 2026 at 01:02:39AM +0400, Marc-André Lureau wrote:
> The text console receives bytes that may be UTF-8 encoded (e.g. from
> a guest running a modern distro), but currently treats each byte as a
> raw character index into the VGA/CP437 font, producing garbled output
> for any multi-byte sequence.
> 
> Add a UTF-8 decoder using Bjoern Hoehrmann's DFA. The DFA inherently
> rejects overlong encodings, surrogates, and codepoints above U+10FFFF.
> Completed codepoints are then mapped to CP437, unmappable characters are
> displayed as '?'.
> 
> Note that QEMU has a "buffered" utf8 decoder in util/unicode.c, but
> it is not a good fit for byte-per-byte decoding.
> 
> Signed-off-by: Marc-André Lureau <[email protected]>
> ---
>  ui/cp437.h      |  13 ++++
>  ui/console-vc.c |  59 ++++++++++++++++
>  ui/cp437.c      | 205 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  ui/meson.build  |   2 +-
>  4 files changed, 278 insertions(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <[email protected]>


With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|


Reply via email to