Hello all, A couple days ago I was looking into how dmenu deals with invalid utf8 sequences and noticed a couple odd things. Here's the testcase for those who want to follow along:
$ printf "0\xef1234567\ntest" | dmenu In drw.c::utf8decode(), invalid utf8 sequence is set to U+FFFD (�) and drw_text continues on doing it's width calculation as if there was a U+FFFD codepoint in the text. However when it comes to actually rendering the text via XftDrawStringUtf8(), we simply pass it `utf8str`; which obviously doesn't have any U+FFFD but instead has invalid utf8 sequences. I'm not sure if this is documented or not, but on my system xft basically just cuts the text off at the error. In other words, only 0 is rendered, followed by a large blank area (see pic0.png). Is this actually the expected behavior? If yes, then why not break out early on error instead of calculating width with a made up U+FFFD which will never be rendered? I have a rough patch which actually renders invalid utf8 as � instead of cutting it off (see pic1.png). IMO it's a nicer behavior. But I wanted to ask what everyone else expects before polishing the patch and sending it over. I also noticed that in utf8decode() there's this line: if (j < len) return 0; Is this ever reachable? If yes, wouldn't it be a infinite loop since `text` would never advance inside drw_text()? - NRK