Re: improve xterm(1) resilience against control code attacks
> Christian Weisgerber wrote on Mon, Mar 07, 2016 at 03:51:41PM +: > > On 2016-03-07, Ingo Schwarze wrote: > > >> Consequently, in the interest of safe and sane defaults, i propose > >> switching our xterm(1) to enable UTF-8 mode by default. > > > Seconded. Please. > >> The best place to switch is in the setup function VTInitialize_locale() > >> that decides whether to enable UTF-8 mode and which supporting flags > >> to set, by pretending to it that CODESET is always UTF-8, but without > >> interfering with the actual value of the CODESET and without changing > >> the utility function xtermEnvUTF8(). > > > Hmm, maybe you are overthinking this. > > Other defaults that we set differently from upstream are simply > > resource changes to XTerm.ad (/usr/X11R6/share/X11/app-defaults/XTerm). Tue, 8 Mar 2016 00:14:45 +0100 Ingo Schwarze > Heh. I considered simply changing the resource defaults, but came > to the wrong conclusion that there wouldn't be a way to achieve the > desired effect. Thanks for bringing it up again, that made me > re-check, and it turns out there *is* a way that is quite > straightforward, minimally intrusive, very robust, and doesn't get > in the way of explicit user configuration: See the patch below. > If this gets OKs, let's forget my previous, more intrusive patch. Maybe still not forget it for future reference though. > With that change, users can obviously still set *locale to other > values (for example, "true" or "false" come to mind), and the command > line options changing *locale (-lc +lc -en) still work. Looking > at the code, explicitly setting *utf8 to false (or equivalently, > +u8 on the command line) also overrides this. > > Spending a day reading xterm source code wasn't wasted, though - > by reading the documentation only, i wouldn't have understood > that this way works as intended and is safe. If it took you that long, no wonder the barrier is too high for entry in this program. Not implying anything though, please leave it be. > >> printf "\303\237\n" # thanks to sobrado@ for the striking example > >> Now your local terminal hangs until you force a reset using the > >> menus of the xterm program. > > > \237 is 0x9F, equivalent to ESC _, which is APC (Application Program > > Command). That appears in a table, but is not explained in the > > VT220 manual. The VT420 manual says: "The VT420 ignores all following > > characters until it receives a SUB, ST, or any other C1 control > > character." > > Yes. I spent so much time reading terminal control code documentation > lately that i probably assumed this to be widely known. ;-) > You are right, explaining it is helpful. More thinking on terminal resilience and utf8, thanks for making huge progress so far system wide. > Index: XTerm.ad > === > RCS file: /cvs/xenocara/app/xterm/XTerm.ad,v > retrieving revision 1.15 > diff -u -p -r1.15 XTerm.ad > --- XTerm.ad 26 Aug 2013 20:06:10 - 1.15 > +++ XTerm.ad 7 Mar 2016 22:54:44 - > @@ -259,6 +259,11 @@ > > ! OpenBSD local modifications > > +! Enable UTF-8 mode since OpenBSD does not support any other multibyte > +! locales. Even for people using the C/POSIX locale for everything, > +! that's safer and more usable than the upstream default of "medium". > +*locale: UTF-8 > + > ! ScrollBar by default > *scrollBar: true Not mentioning the unrelated login resource class set to true in another patch for application defaults instead of user resources.
Re: improve xterm(1) resilience against control code attacks
On Tue, Mar 08, 2016 at 12:14:45AM +0100, Ingo Schwarze wrote: > Hi Christian, > > Christian Weisgerber wrote on Mon, Mar 07, 2016 at 03:51:41PM +: > > On 2016-03-07, Ingo Schwarze wrote: > > >> Consequently, in the interest of safe and sane defaults, i propose > >> switching our xterm(1) to enable UTF-8 mode by default. > > > Seconded. > > >> The best place to switch is in the setup function VTInitialize_locale() > >> that decides whether to enable UTF-8 mode and which supporting flags > >> to set, by pretending to it that CODESET is always UTF-8, but without > >> interfering with the actual value of the CODESET and without changing > >> the utility function xtermEnvUTF8(). > > > Hmm, maybe you are overthinking this. > > Other defaults that we set differently from upstream are simply > > resource changes to XTerm.ad (/usr/X11R6/share/X11/app-defaults/XTerm). > > Heh. I considered simply changing the resource defaults, but came > to the wrong conclusion that there wouldn't be a way to achieve the > desired effect. Thanks for bringing it up again, that made me > re-check, and it turns out there *is* a way that is quite > straightforward, minimally intrusive, very robust, and doesn't get > in the way of explicit user configuration: See the patch below. > If this gets OKs, let's forget my previous, more intrusive patch. > > With that change, users can obviously still set *locale to other > values (for example, "true" or "false" come to mind), and the command > line options changing *locale (-lc +lc -en) still work. Looking > at the code, explicitly setting *utf8 to false (or equivalently, > +u8 on the command line) also overrides this. > > Spending a day reading xterm source code wasn't wasted, though - > by reading the documentation only, i wouldn't have understood > that this way works as intended and is safe. > > OK? > Ingo > Ok. Thanks for the extra explanations. > > > > > PS: > > >> printf "\303\237\n" # thanks to sobrado@ for the striking example > >> Now your local terminal hangs until you force a reset using the > >> menus of the xterm program. > > > \237 is 0x9F, equivalent to ESC _, which is APC (Application Program > > Command). That appears in a table, but is not explained in the > > VT220 manual. The VT420 manual says: "The VT420 ignores all following > > characters until it receives a SUB, ST, or any other C1 control > > character." > > Yes. I spent so much time reading terminal control code documentation > lately that i probably assumed this to be widely known. ;-) > You are right, explaining it is helpful. > > > Index: XTerm.ad > === > RCS file: /cvs/xenocara/app/xterm/XTerm.ad,v > retrieving revision 1.15 > diff -u -p -r1.15 XTerm.ad > --- XTerm.ad 26 Aug 2013 20:06:10 - 1.15 > +++ XTerm.ad 7 Mar 2016 22:54:44 - > @@ -259,6 +259,11 @@ > > ! OpenBSD local modifications > > +! Enable UTF-8 mode since OpenBSD does not support any other multibyte > +! locales. Even for people using the C/POSIX locale for everything, > +! that's safer and more usable than the upstream default of "medium". > +*locale: UTF-8 > + > ! ScrollBar by default > *scrollBar: true > -- Matthieu Herrb pgp90Bs5BZ5DM.pgp Description: PGP signature
Re: improve xterm(1) resilience against control code attacks
Hi Christian, Christian Weisgerber wrote on Mon, Mar 07, 2016 at 03:51:41PM +: > On 2016-03-07, Ingo Schwarze wrote: >> Consequently, in the interest of safe and sane defaults, i propose >> switching our xterm(1) to enable UTF-8 mode by default. > Seconded. >> The best place to switch is in the setup function VTInitialize_locale() >> that decides whether to enable UTF-8 mode and which supporting flags >> to set, by pretending to it that CODESET is always UTF-8, but without >> interfering with the actual value of the CODESET and without changing >> the utility function xtermEnvUTF8(). > Hmm, maybe you are overthinking this. > Other defaults that we set differently from upstream are simply > resource changes to XTerm.ad (/usr/X11R6/share/X11/app-defaults/XTerm). Heh. I considered simply changing the resource defaults, but came to the wrong conclusion that there wouldn't be a way to achieve the desired effect. Thanks for bringing it up again, that made me re-check, and it turns out there *is* a way that is quite straightforward, minimally intrusive, very robust, and doesn't get in the way of explicit user configuration: See the patch below. If this gets OKs, let's forget my previous, more intrusive patch. With that change, users can obviously still set *locale to other values (for example, "true" or "false" come to mind), and the command line options changing *locale (-lc +lc -en) still work. Looking at the code, explicitly setting *utf8 to false (or equivalently, +u8 on the command line) also overrides this. Spending a day reading xterm source code wasn't wasted, though - by reading the documentation only, i wouldn't have understood that this way works as intended and is safe. OK? Ingo > > PS: >> printf "\303\237\n" # thanks to sobrado@ for the striking example >> Now your local terminal hangs until you force a reset using the >> menus of the xterm program. > \237 is 0x9F, equivalent to ESC _, which is APC (Application Program > Command). That appears in a table, but is not explained in the > VT220 manual. The VT420 manual says: "The VT420 ignores all following > characters until it receives a SUB, ST, or any other C1 control > character." Yes. I spent so much time reading terminal control code documentation lately that i probably assumed this to be widely known. ;-) You are right, explaining it is helpful. Index: XTerm.ad === RCS file: /cvs/xenocara/app/xterm/XTerm.ad,v retrieving revision 1.15 diff -u -p -r1.15 XTerm.ad --- XTerm.ad26 Aug 2013 20:06:10 - 1.15 +++ XTerm.ad7 Mar 2016 22:54:44 - @@ -259,6 +259,11 @@ ! OpenBSD local modifications +! Enable UTF-8 mode since OpenBSD does not support any other multibyte +! locales. Even for people using the C/POSIX locale for everything, +! that's safer and more usable than the upstream default of "medium". +*locale: UTF-8 + ! ScrollBar by default *scrollBar: true
Re: improve xterm(1) resilience against control code attacks
On 2016-03-07, Ingo Schwarze wrote: > Consequently, in the interest of safe and sane defaults, i propose > switching our xterm(1) to enable UTF-8 mode by default. Seconded. > The best place to switch is in the setup function VTInitialize_locale() > that decides whether to enable UTF-8 mode and which supporting flags > to set, by pretending to it that CODESET is always UTF-8, but without > interfering with the actual value of the CODESET and without changing > the utility function xtermEnvUTF8(). Hmm, maybe you are overthinking this. Other defaults that we set differently from upstream are simply resource changes to XTerm.ad (/usr/X11R6/share/X11/app-defaults/XTerm). PS: > printf "\303\237\n" # thanks to sobrado@ for the striking example > Now your local terminal hangs until you force a reset using the > menus of the xterm program. \237 is 0x9F, equivalent to ESC _, which is APC (Application Program Command). That appears in a table, but is not explained in the VT220 manual. The VT420 manual says: "The VT420 ignores all following characters until it receives a SUB, ST, or any other C1 control character." -- Christian "naddy" Weisgerber na...@mips.inka.de
improve xterm(1) resilience against control code attacks
Hi, if two programs communicating encoded character strings to each other disagree about the encoding, that can result in problems. One particular example of such communication is an application program passing output text to a terminal emulator program. If the terminal uses a different encoding for decoding the text than the application used for encoding it, the terminal may see control codes where the application only intended printable characters. This can screw up the terminal state, spoiling display of subsequent text or even hanging the terminal. Actually, i assume that this problem occurs frequently in practice, for the following reasons. If the application program is well-behaved, it either produces C/POSIX/US-ASCII output only, or its idea of the encoding to use is governed by the LC_CTYPE locale(1) environment variable, typically passed to it by the shell it was started from. Now that locale(1) environment is completely unrelated to whatever encoding the terminal may be set up for. It may not even be on the same physical machine. For example, during an SSH session, your terminal is on the local SSH client machine, while the shell starting your application programs is on the remote SSH server machine. To fully appreciate the implications, try out the following scenario: Start an xterm(1) that is not UTF-8 enabled on your local machine by saying "xterm +lc +u8". Unset LC_ALL, LC_CTYPE, and LANG; check with locale(1) that your locale is "C". Use ssh(1) to connect to a remote machine. Now simulate a program producing UTF-8 output on the remote machine, for example U+00DF LATIN SMALL LETTER SHARP S: printf "\303\237\n" # thanks to sobrado@ for the striking example Now your local terminal hangs until you force a reset using the menus of the xterm program. If the shell startup files on the remote machine set LC_CTYPE=en_US.UTF-8 or something similar by default, programs on the remote machine will always do just that. That shows how easy it is to inadvertently cause application-terminal character encoding mismatches; yet i doubt that many people are aware of the problem. So we should try to reduce the likelihood that people get burnt by such effects. On an operating system supporting any third locale in addition to C/POSIX and UTF-8, people are screwed beyond rescue because even if one side of the connection assumes US-ASCII, communication is still unsafe in both directions. Reinterpreting US-ASCII in an arbitrary encoding and reinterpreting an arbitrary encoding as US-ASCII may both turn innocuous printable characters into dangerous terminal control codes. That is particularly bitter because some programs will always output US-ASCII, which is not safe to display in a terminal set up for an arbitrary locale. Fortunately, in OpenBSD, we made the decision to only support exactly two locales, C/POSIX and UTF-8, and this combination has the following properties: 1. Printing unsanitized strings to the terminal is never safe, no matter the locale and terminal setup (think of "cat /bsd"). 2. Printing sanitized US-ASCII to a US-ASCII terminal is safe. 3. Printing sanitized UTF-8 to a UTF-8 terminal is safe. 4. Printing sanitized US-ASCII to a UTF-8 terminal is safe. That is important because there are some programs that we may never want to add UTF-8 support to. However: 5. Printing sanitized UTF-8 to a US-ASCII terminal is *NOT* safe. Remember the example above that hung a US-ASCII terminal by printing U+00DF LATIN SMALL LETTER SHARP S in UTF-8 to it. By default, our xterm(1) runs in US-ASCII mode. In view of the above, that's a terrible idea, even if the user doesn't intend to ever use UTF-8. A UTF-8 terminal handles the US-ASCII the user wants just fine, and in addition to that, and mostly for free, it is more resilient against stray UTF-8 sneaking in. Actually, even when fed garbage or unsupported encodings, a UTF-8 xterm(1) is more robust than a US-ASCII xterm(1) because the UTF-8 xterm(1) honours *fewer* terminal escape codes than the US-ASCII xterm(1). That may seem surprising at first because Unicode defines *more* control characters than US-ASCII does. But as explained on http://invisible-island.net/xterm/ctlseqs/ctlseqs.html xterm(1) never treats decoded multibyte characters as terminal control codes, so the ISO 6429 C1 control codes do not take effect in UTF-8 mode; but they do take effect in US-ASCII mode, even though they fall outside the scope of ASCII. Consequently, in the interest of safe and sane defaults, i propose switching our xterm(1) to enable UTF-8 mode by default. If somebody insists on running an xterm(1) in US-ASCII mode, there are still many ways to force that, for example with "+lc +u8". It is rather tricky to get the switch right because the locale+encoding user interface of xterm(1) is ridiculously complicated. It uses three X resources (*locale, *utf8, *wideChar) with 5+4+2 possible values (*locale: true, medium, chec