Re: [Patch] Support UTF-8 scripts
Followup to: <[EMAIL PROTECTED]> By author:=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> In newsgroup: linux.dev.kernel > > This patch adds support for UTF-8 signatures (aka BOM, byte order > mark) to binfmt_script. Files that start with EF BF FF # ! are now > recognized as scripts (in addition to files starting with # !). > > With such support, creating scripts that reliably carry non-ASCII > characters is simplified. Editors and the script interpreter can > easily agree on what the encoding of the script is, and the > interpreter can then render strings appropriately. Currently, > Python supports source files that start with the UTF-8 signature; > the approach would naturally extend to Perl to enhance/replace > the "use utf8" pragma. Likewise, Tcl could use the UTF-8 signature > to reliably identify UTF-8 source code (instead of assuming > [encoding system] for source code). > BOM should not be used in UTF-8. In fact, it shouldn't be used at all. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
Followup to: [EMAIL PROTECTED] By author:=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= [EMAIL PROTECTED] In newsgroup: linux.dev.kernel This patch adds support for UTF-8 signatures (aka BOM, byte order mark) to binfmt_script. Files that start with EF BF FF # ! are now recognized as scripts (in addition to files starting with # !). With such support, creating scripts that reliably carry non-ASCII characters is simplified. Editors and the script interpreter can easily agree on what the encoding of the script is, and the interpreter can then render strings appropriately. Currently, Python supports source files that start with the UTF-8 signature; the approach would naturally extend to Perl to enhance/replace the use utf8 pragma. Likewise, Tcl could use the UTF-8 signature to reliably identify UTF-8 source code (instead of assuming [encoding system] for source code). BOM should not be used in UTF-8. In fact, it shouldn't be used at all. -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Mon, 15 Aug 2005 00:55:54 +0100, Alan Cox <[EMAIL PROTECTED]> wrote: > On Sul, 2005-08-14 at 15:59 -0400, Lee Revell wrote: > > I know the alternatives are available. That doesn't make it any less > > idiotic to use non ASCII characters as operators. I think it's a very > > slippery slope. We write code in ASCII, dammit. > > Its a trivial patch and there is a lot to be said for UTF-8 scripts. As > to writing code in ascii, the kernel regularly has outbreaks of either > UTF-8 or ISO-8859-* especially in the docs directory. Standardising > these on UTF-8 would be helpful. > > Yes the kernel code is C so ASCII except for the odd abuser of the © > symbol. We write kernel code in ASCII because of patches in e-mail. When a patch is saved (often by a script), it is divorced of the encoding in which e-mail was done. Forwarding of patches then causes them to fail to apply. Everything else can be worked around. In my experience, the most common case of such patch rejects has to do with a European using a non-UTF-8 encoding for his name, rather than with the copyright symbol. -- Pete - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sun, Aug 14, 2005 at 08:00:31PM +, Lee Revell wrote: > We write code in ASCII, dammit. http://www.madore.org/~david/weblog/2004-12.html#d.2004-12-03.0813 > :-) -- David A. Madore ([EMAIL PROTECTED], http://www.madore.org/~david/ ) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
>> > Thats great for the perl6 people. >> > http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going >> > to be using « and » as operators... >> >> Is Larry smoking crack? That's one of the worst ideas I've heard in a >> long time. There's no easy way to enter those at the keyboard! > > I have "setxkbmap -symbols 'en_US(pc102)+gb'" in my ~/.xsession, >and « and » are available as AltGr-z and AltGr-x respectively. .Xmodmap: keycode 117 = MultiKey and then use [the Windows(R) Context Menu Key],[<],[<] to generate « Cheers :) Jan Engelhardt --
Re: [Patch] Support UTF-8 scripts
Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... Is Larry smoking crack? That's one of the worst ideas I've heard in a long time. There's no easy way to enter those at the keyboard! I have setxkbmap -symbols 'en_US(pc102)+gb' in my ~/.xsession, and « and » are available as AltGr-z and AltGr-x respectively. .Xmodmap: keycode 117 = MultiKey and then use [the Windows(R) Context Menu Key],[],[] to generate « Cheers :) Jan Engelhardt --
Re: [Patch] Support UTF-8 scripts
On Sun, Aug 14, 2005 at 08:00:31PM +, Lee Revell wrote: We write code in ASCII, dammit. URL: http://www.madore.org/~david/weblog/2004-12.html#d.2004-12-03.0813 :-) -- David A. Madore ([EMAIL PROTECTED], http://www.madore.org/~david/ ) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Mon, 15 Aug 2005 00:55:54 +0100, Alan Cox [EMAIL PROTECTED] wrote: On Sul, 2005-08-14 at 15:59 -0400, Lee Revell wrote: I know the alternatives are available. That doesn't make it any less idiotic to use non ASCII characters as operators. I think it's a very slippery slope. We write code in ASCII, dammit. Its a trivial patch and there is a lot to be said for UTF-8 scripts. As to writing code in ascii, the kernel regularly has outbreaks of either UTF-8 or ISO-8859-* especially in the docs directory. Standardising these on UTF-8 would be helpful. Yes the kernel code is C so ASCII except for the odd abuser of the © symbol. We write kernel code in ASCII because of patches in e-mail. When a patch is saved (often by a script), it is divorced of the encoding in which e-mail was done. Forwarding of patches then causes them to fail to apply. Everything else can be worked around. In my experience, the most common case of such patch rejects has to do with a European using a non-UTF-8 encoding for his name, rather than with the copyright symbol. -- Pete - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
Lee Revell wrote: On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... Is Larry smoking crack? That's one of the worst ideas I've heard in a long time. There's no easy way to enter those at the keyboard! On your keyboard, that is. So what? My keyboard happen to have no easy way of entering a dollar sign, even though it is in «ascii». That makes sense though, as it is one of those ascii characters that is almost never used in my part of the world. Still, if I needed to use the «$» when programming, I sure could map it to some key combination. X is nice that way. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
Lee Revell wrote: On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... Is Larry smoking crack? That's one of the worst ideas I've heard in a long time. There's no easy way to enter those at the keyboard! On your keyboard, that is. So what? My keyboard happen to have no easy way of entering a dollar sign, even though it is in «ascii». That makes sense though, as it is one of those ascii characters that is almost never used in my part of the world. Still, if I needed to use the «$» when programming, I sure could map it to some key combination. X is nice that way. Helge Hafting - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sul, 2005-08-14 at 15:59 -0400, Lee Revell wrote: > I know the alternatives are available. That doesn't make it any less > idiotic to use non ASCII characters as operators. I think it's a very > slippery slope. We write code in ASCII, dammit. Its a trivial patch and there is a lot to be said for UTF-8 scripts. As to writing code in ascii, the kernel regularly has outbreaks of either UTF-8 or ISO-8859-* especially in the docs directory. Standardising these on UTF-8 would be helpful. Yes the kernel code is C so ASCII except for the odd abuser of the © symbol. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sun, 14 Aug 2005 17:52:36 EDT, Kyle Moffett said: > > Note that ?^ is functionally identical to !.?| differs from || in > Since when is the string "!.?|" an operator??? I think that was supposed to read: Note that ?^ is functionally identical to !. ?| differs from ?? in that ?| returns (and so on) (two separate sentences lacking whitespace between them pgpEXEwZRKrw0.pgp Description: PGP signature
Re: [Patch] Support UTF-8 scripts
Lee Revell wrote: > For strings, of course. But there's no need for UTF-8 operators. Indeed - this is the main rationale for the patch, of course. People want to write non-ASCII in script primarily in string literals, and (perhaps even more often) in comments. Now, for comments, it wouldn't really matter that the interpreter knows what the encoding is - but the editor would have to know, and the UTF-8 signature primarily helps the editor (*). Then we are back to the rationale for this patch: if you need the UTF-8 signature to reliably identify the script as being UTF-8 encoded, you then currently cannot easily run it as a script through binfmt_script, as that code requires a script to start with #!. Regards, Martin (*) As I said before: atleast for Python, the UTF-8 signature also has syntactic meaning. It is allowed at the beginning of a file as an addition to the language syntax, and it tells the interpreter that Unicode literals (usually represented internally as UCS-2) are represented as UTF-8 in the source code. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Aug 14, 2005, at 02:18:13, Jason L Tibbitts III wrote: "LR" == Lee Revell <[EMAIL PROTECTED]> writes: LR> Is Larry smoking crack? From the Perl6-Bible: http://search.cpan.org/dist/Perl6-Bible/lib/ Perl6/Bible/S03.pod: I think this confirms that the answer is yes. See the following at the above URL: Note that ?^ is functionally identical to !.?| differs from || in that ?| always returns a standard boolean value (either 1 or 0), whereas || returns the actual value of the first of its arguments that is true. Since when is the string "!.?|" an operator??? Or "?^", "+|", "~|", "?|", etc. I think Larry's gone off the deep end on this one. It may be an incredibly powerful and expressive language, but it seems _really_ strange, and probably will produce the best Obfuscated-code contest the world has ever seen. (Better even than the Perl5 one). Cheers, Kyle Moffett -- Simple things should be simple and complex things should be possible -- Alan Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sun, 2005-08-14 at 13:13 -0700, Stephen Pollei wrote: > Seems like lots of Europeans might want a bigger > charset, not to mention Asians, Hindus, and whomever else. For strings, of course. But there's no need for UTF-8 operators. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On 8/14/05, Lee Revell <[EMAIL PROTECTED]> wrote: > I know the alternatives are available. That doesn't make it any less > idiotic to use non ASCII characters as operators. I think it's a very > slippery slope. We write code in ASCII, dammit. Yes you and I might write 99.9% of our code in good'ol **American** Standard Code for Information Interchange -- however not all the world is USA. For instance notice the http://de.wikipedia.org/wiki/Umlaut/ in "Löwis"... Seems like lots of Europeans might want a bigger charset, not to mention Asians, Hindus, and whomever else. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
Previously Lee Revell wrote: > My point exactly, it's idiotic for Perl6 to use these as OPERATORS, the > atoms of the language, when there's not even a platform independent way > to type them in. I anyone had bothered to read the URL in one of the earlier emails you would have seen that '<<' is an accepted alternative spelling. Wichert. -- Wichert Akkerman <[EMAIL PROTECTED]>It is simple to make things. http://www.wiggy.net/ It is hard to make things simple. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
> "LR" == Lee Revell <[EMAIL PROTECTED]> writes: LR> Is Larry smoking crack? That's one of the worst ideas I've heard LR> in a long time. There's no easy way to enter those at the LR> keyboard! I know folks enjoy trashing Perl these days, but it's not justified in this case. From the Perl6-Bible - http://search.cpan.org/dist/Perl6-Bible/lib/Perl6/Bible/S03.pod: For those still living without the blessings of Unicode, that can also be written: << ... >>. - J< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
LR == Lee Revell [EMAIL PROTECTED] writes: LR Is Larry smoking crack? That's one of the worst ideas I've heard LR in a long time. There's no easy way to enter those at the LR keyboard! I know folks enjoy trashing Perl these days, but it's not justified in this case. From the Perl6-Bible - http://search.cpan.org/dist/Perl6-Bible/lib/Perl6/Bible/S03.pod: For those still living without the blessings of Unicode, that can also be written: ... . - J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
Previously Lee Revell wrote: My point exactly, it's idiotic for Perl6 to use these as OPERATORS, the atoms of the language, when there's not even a platform independent way to type them in. I anyone had bothered to read the URL in one of the earlier emails you would have seen that '' is an accepted alternative spelling. Wichert. -- Wichert Akkerman [EMAIL PROTECTED]It is simple to make things. http://www.wiggy.net/ It is hard to make things simple. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On 8/14/05, Lee Revell [EMAIL PROTECTED] wrote: I know the alternatives are available. That doesn't make it any less idiotic to use non ASCII characters as operators. I think it's a very slippery slope. We write code in ASCII, dammit. Yes you and I might write 99.9% of our code in good'ol **American** Standard Code for Information Interchange -- however not all the world is USA. For instance notice the http://de.wikipedia.org/wiki/Umlaut/ in Löwis... Seems like lots of Europeans might want a bigger charset, not to mention Asians, Hindus, and whomever else. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sun, 2005-08-14 at 13:13 -0700, Stephen Pollei wrote: Seems like lots of Europeans might want a bigger charset, not to mention Asians, Hindus, and whomever else. For strings, of course. But there's no need for UTF-8 operators. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Aug 14, 2005, at 02:18:13, Jason L Tibbitts III wrote: LR == Lee Revell [EMAIL PROTECTED] writes: LR Is Larry smoking crack? From the Perl6-Bible: http://search.cpan.org/dist/Perl6-Bible/lib/ Perl6/Bible/S03.pod: I think this confirms that the answer is yes. See the following at the above URL: Note that ?^ is functionally identical to !.?| differs from || in that ?| always returns a standard boolean value (either 1 or 0), whereas || returns the actual value of the first of its arguments that is true. Since when is the string !.?| an operator??? Or ?^, +|, ~|, ?|, etc. I think Larry's gone off the deep end on this one. It may be an incredibly powerful and expressive language, but it seems _really_ strange, and probably will produce the best Obfuscated-code contest the world has ever seen. (Better even than the Perl5 one). Cheers, Kyle Moffett -- Simple things should be simple and complex things should be possible -- Alan Kay - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
Lee Revell wrote: For strings, of course. But there's no need for UTF-8 operators. Indeed - this is the main rationale for the patch, of course. People want to write non-ASCII in script primarily in string literals, and (perhaps even more often) in comments. Now, for comments, it wouldn't really matter that the interpreter knows what the encoding is - but the editor would have to know, and the UTF-8 signature primarily helps the editor (*). Then we are back to the rationale for this patch: if you need the UTF-8 signature to reliably identify the script as being UTF-8 encoded, you then currently cannot easily run it as a script through binfmt_script, as that code requires a script to start with #!. Regards, Martin (*) As I said before: atleast for Python, the UTF-8 signature also has syntactic meaning. It is allowed at the beginning of a file as an addition to the language syntax, and it tells the interpreter that Unicode literals (usually represented internally as UCS-2) are represented as UTF-8 in the source code. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sun, 14 Aug 2005 17:52:36 EDT, Kyle Moffett said: Note that ?^ is functionally identical to !.?| differs from || in Since when is the string !.?| an operator??? I think that was supposed to read: Note that ?^ is functionally identical to !. ?| differs from ?? in that ?| returns (and so on) (two separate sentences lacking whitespace between them pgpEXEwZRKrw0.pgp Description: PGP signature
Re: [Patch] Support UTF-8 scripts
On Sul, 2005-08-14 at 15:59 -0400, Lee Revell wrote: I know the alternatives are available. That doesn't make it any less idiotic to use non ASCII characters as operators. I think it's a very slippery slope. We write code in ASCII, dammit. Its a trivial patch and there is a lot to be said for UTF-8 scripts. As to writing code in ascii, the kernel regularly has outbreaks of either UTF-8 or ISO-8859-* especially in the docs directory. Standardising these on UTF-8 would be helpful. Yes the kernel code is C so ASCII except for the odd abuser of the © symbol. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
> "Alan" == Alan Cox <[EMAIL PROTECTED]> writes: Alan> The command line console mappings may not include them by Alan> default (you can obviously add them if your keyboard lacks Alan> them). The X keyboard however does include compose functionality Alan> for » and « and many other symbols that might be useful eg ± Not to mention that many editors, including emacs and vim, have their own support for entering such non-ascii characters no matter what the console or X11 keyboards look like. -JimC -- James H. Cloos, Jr. <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sat, 2005-08-13 at 21:19 -0400, Kyle Moffett wrote: > And those of us who are Mac OS X oriented have patched our console and > X keycodes to match the mac way of generating symbols: > > Alt-\= « > Alt-Shift-\ = » > Alt-Shift-+ = ± > My point exactly, it's idiotic for Perl6 to use these as OPERATORS, the atoms of the language, when there's not even a platform independent way to type them in. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Aug 13, 2005, at 20:57:45, Alan Cox wrote: I have "setxkbmap -symbols 'en_US(pc102)+gb'" in my ~/.xsession, and « and » are available as AltGr-z and AltGr-x respectively. Most keyboards don't have an AltGr key. You must be an American. Most old the worlds keyboards have an AltGr key. You'll find that US keyboards have two alt keys to avoid confusing people (like one button mice ;)) but the right one is understood by the X bindings to be "AltGr". Even though the US keyboard is apparently lacking functionality its purely a text label issue And those of us who are Mac OS X oriented have patched our console and X keycodes to match the mac way of generating symbols: Alt-\= « Alt-Shift-\ = » Alt-Shift-+ = ± If only someone could come up with a good character palette like exists on that OS, something that could generate a wide variety of keysyms, preferably all of UTF-8, and send them to the topmost window. Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
> >I have "setxkbmap -symbols 'en_US(pc102)+gb'" in my ~/.xsession, > > and « and » are available as AltGr-z and AltGr-x respectively. > > Most keyboards don't have an AltGr key. You must be an American. Most old the worlds keyboards have an AltGr key. You'll find that US keyboards have two alt keys to avoid confusing people (like one button mice ;)) but the right one is understood by the X bindings to be "AltGr". Even though the US keyboard is apparently lacking functionality its purely a text label issue Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sad, 2005-08-13 at 14:42 -0400, Lee Revell wrote: > Is Larry smoking crack? That's one of the worst ideas I've heard in a > long time. There's no easy way to enter those at the keyboard! The command line console mappings may not include them by default (you can obviously add them if your keyboard lacks them). The X keyboard however does include compose functionality for » and « and many other symbols that might be useful eg ± Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sat, 2005-08-13 at 19:49 +0100, Hugo Mills wrote: > On Sat, Aug 13, 2005 at 02:42:52PM -0400, Lee Revell wrote: > > On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: > > > Thats great for the perl6 people. > > > http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going > > > to be using « and » as operators... > > > > Is Larry smoking crack? That's one of the worst ideas I've heard in a > > long time. There's no easy way to enter those at the keyboard! > >I have "setxkbmap -symbols 'en_US(pc102)+gb'" in my ~/.xsession, > and « and » are available as AltGr-z and AltGr-x respectively. > Well, now it's obvious he's just trying to raise the bar for the obfuscated perl contest. If you thought these were fun before, you'll love them with ¥ and « and »! Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sat, 2005-08-13 at 19:49 +0100, Hugo Mills wrote: > On Sat, Aug 13, 2005 at 02:42:52PM -0400, Lee Revell wrote: > > On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: > > > Thats great for the perl6 people. > > > http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going > > > to be using « and » as operators... > > > > Is Larry smoking crack? That's one of the worst ideas I've heard in a > > long time. There's no easy way to enter those at the keyboard! > >I have "setxkbmap -symbols 'en_US(pc102)+gb'" in my ~/.xsession, > and « and » are available as AltGr-z and AltGr-x respectively. Most keyboards don't have an AltGr key. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sat, Aug 13, 2005 at 02:42:52PM -0400, Lee Revell wrote: > On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: > > Thats great for the perl6 people. > > http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going > > to be using « and » as operators... > > Is Larry smoking crack? That's one of the worst ideas I've heard in a > long time. There's no easy way to enter those at the keyboard! I have "setxkbmap -symbols 'en_US(pc102)+gb'" in my ~/.xsession, and « and » are available as AltGr-z and AltGr-x respectively. Hugo. -- === Hugo Mills: [EMAIL PROTECTED] carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Anyone who claims their cryptographic protocol is secure is --- either a genius or a fool. Given the genius/fool ratio for our species, the odds aren't good. signature.asc Description: Digital signature
Re: [Patch] Support UTF-8 scripts
On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: > Thats great for the perl6 people. > http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going > to be using « and » as operators... Is Larry smoking crack? That's one of the worst ideas I've heard in a long time. There's no easy way to enter those at the keyboard! http://www.cl.cam.ac.uk/~mgk25/unicode.html#input Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On 8/13/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > This patch adds support for UTF-8 signatures (aka BOM, byte order > mark) to binfmt_script. > With such support, creating scripts that reliably carry non-ASCII > characters is simplified. > the approach would naturally extend to Perl to enhance/replace > the "use utf8" pragma. Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... So I'd imagine that a lot of perl6 scripts would be utf8. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch] Support UTF-8 scripts
This patch adds support for UTF-8 signatures (aka BOM, byte order mark) to binfmt_script. Files that start with EF BF FF # ! are now recognized as scripts (in addition to files starting with # !). With such support, creating scripts that reliably carry non-ASCII characters is simplified. Editors and the script interpreter can easily agree on what the encoding of the script is, and the interpreter can then render strings appropriately. Currently, Python supports source files that start with the UTF-8 signature; the approach would naturally extend to Perl to enhance/replace the "use utf8" pragma. Likewise, Tcl could use the UTF-8 signature to reliably identify UTF-8 source code (instead of assuming [encoding system] for source code). Please find the patch attached below. Regards, Martin Signed-off-by: Martin v. Löwis <[EMAIL PROTECTED]> diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -1,7 +1,7 @@ /* * linux/fs/binfmt_script.c * - * Copyright (C) 1996 Martin von Löwis + * Copyright (C) 1996, 2005 Martin von Löwis * original #!-checking implemented by tytso. */ @@ -23,7 +23,16 @@ static int load_script(struct linux_binp char interp[BINPRM_BUF_SIZE]; int retval; - if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') || (bprm->sh_bang)) + /* It is a recursive invocation. */ + if (bprm->sh_bang) + return -ENOEXEC; + + /* It starts neither with #!, nor with #! preceded by + the UTF-8 signature. */ + if (!(((bprm->buf[0] == '#') && (bprm->buf[1] == '!')) + || ((bprm->buf[0] == '\xef') && (bprm->buf[1] == '\xbb') + && (bprm->buf[2] == '\xbf') && (bprm->buf[3] == '#') + && (bprm->buf[4] == '!' return -ENOEXEC; /* * This section does the #! interpretation. @@ -46,7 +55,8 @@ static int load_script(struct linux_binp else break; } - for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++); + cp = (bprm->buf[0]=='\xef') ? bprm->buf+5 : bprm->buf+2; + while ((*cp == ' ') || (*cp == '\t')) cp++; if (*cp == '\0') return -ENOEXEC; /* No interpreter name found */ i_name = cp; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch] Support UTF-8 scripts
This patch adds support for UTF-8 signatures (aka BOM, byte order mark) to binfmt_script. Files that start with EF BF FF # ! are now recognized as scripts (in addition to files starting with # !). With such support, creating scripts that reliably carry non-ASCII characters is simplified. Editors and the script interpreter can easily agree on what the encoding of the script is, and the interpreter can then render strings appropriately. Currently, Python supports source files that start with the UTF-8 signature; the approach would naturally extend to Perl to enhance/replace the use utf8 pragma. Likewise, Tcl could use the UTF-8 signature to reliably identify UTF-8 source code (instead of assuming [encoding system] for source code). Please find the patch attached below. Regards, Martin Signed-off-by: Martin v. Löwis [EMAIL PROTECTED] diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -1,7 +1,7 @@ /* * linux/fs/binfmt_script.c * - * Copyright (C) 1996 Martin von Löwis + * Copyright (C) 1996, 2005 Martin von Löwis * original #!-checking implemented by tytso. */ @@ -23,7 +23,16 @@ static int load_script(struct linux_binp char interp[BINPRM_BUF_SIZE]; int retval; - if ((bprm-buf[0] != '#') || (bprm-buf[1] != '!') || (bprm-sh_bang)) + /* It is a recursive invocation. */ + if (bprm-sh_bang) + return -ENOEXEC; + + /* It starts neither with #!, nor with #! preceded by + the UTF-8 signature. */ + if (!(((bprm-buf[0] == '#') (bprm-buf[1] == '!')) + || ((bprm-buf[0] == '\xef') (bprm-buf[1] == '\xbb') + (bprm-buf[2] == '\xbf') (bprm-buf[3] == '#') + (bprm-buf[4] == '!' return -ENOEXEC; /* * This section does the #! interpretation. @@ -46,7 +55,8 @@ static int load_script(struct linux_binp else break; } - for (cp = bprm-buf+2; (*cp == ' ') || (*cp == '\t'); cp++); + cp = (bprm-buf[0]=='\xef') ? bprm-buf+5 : bprm-buf+2; + while ((*cp == ' ') || (*cp == '\t')) cp++; if (*cp == '\0') return -ENOEXEC; /* No interpreter name found */ i_name = cp; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On 8/13/05, Martin v. Löwis [EMAIL PROTECTED] wrote: This patch adds support for UTF-8 signatures (aka BOM, byte order mark) to binfmt_script. With such support, creating scripts that reliably carry non-ASCII characters is simplified. the approach would naturally extend to Perl to enhance/replace the use utf8 pragma. Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... So I'd imagine that a lot of perl6 scripts would be utf8. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... Is Larry smoking crack? That's one of the worst ideas I've heard in a long time. There's no easy way to enter those at the keyboard! http://www.cl.cam.ac.uk/~mgk25/unicode.html#input Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sat, Aug 13, 2005 at 02:42:52PM -0400, Lee Revell wrote: On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... Is Larry smoking crack? That's one of the worst ideas I've heard in a long time. There's no easy way to enter those at the keyboard! I have setxkbmap -symbols 'en_US(pc102)+gb' in my ~/.xsession, and « and » are available as AltGr-z and AltGr-x respectively. Hugo. -- === Hugo Mills: [EMAIL PROTECTED] carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Anyone who claims their cryptographic protocol is secure is --- either a genius or a fool. Given the genius/fool ratio for our species, the odds aren't good. signature.asc Description: Digital signature
Re: [Patch] Support UTF-8 scripts
On Sat, 2005-08-13 at 19:49 +0100, Hugo Mills wrote: On Sat, Aug 13, 2005 at 02:42:52PM -0400, Lee Revell wrote: On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... Is Larry smoking crack? That's one of the worst ideas I've heard in a long time. There's no easy way to enter those at the keyboard! I have setxkbmap -symbols 'en_US(pc102)+gb' in my ~/.xsession, and « and » are available as AltGr-z and AltGr-x respectively. Most keyboards don't have an AltGr key. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sat, 2005-08-13 at 19:49 +0100, Hugo Mills wrote: On Sat, Aug 13, 2005 at 02:42:52PM -0400, Lee Revell wrote: On Sat, 2005-08-13 at 09:35 -0700, Stephen Pollei wrote: Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... Is Larry smoking crack? That's one of the worst ideas I've heard in a long time. There's no easy way to enter those at the keyboard! I have setxkbmap -symbols 'en_US(pc102)+gb' in my ~/.xsession, and « and » are available as AltGr-z and AltGr-x respectively. Well, now it's obvious he's just trying to raise the bar for the obfuscated perl contest. If you thought these were fun before, you'll love them with ¥ and « and »! Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sad, 2005-08-13 at 14:42 -0400, Lee Revell wrote: Is Larry smoking crack? That's one of the worst ideas I've heard in a long time. There's no easy way to enter those at the keyboard! The command line console mappings may not include them by default (you can obviously add them if your keyboard lacks them). The X keyboard however does include compose functionality for » and « and many other symbols that might be useful eg ± Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
I have setxkbmap -symbols 'en_US(pc102)+gb' in my ~/.xsession, and « and » are available as AltGr-z and AltGr-x respectively. Most keyboards don't have an AltGr key. You must be an American. Most old the worlds keyboards have an AltGr key. You'll find that US keyboards have two alt keys to avoid confusing people (like one button mice ;)) but the right one is understood by the X bindings to be AltGr. Even though the US keyboard is apparently lacking functionality its purely a text label issue Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Aug 13, 2005, at 20:57:45, Alan Cox wrote: I have setxkbmap -symbols 'en_US(pc102)+gb' in my ~/.xsession, and « and » are available as AltGr-z and AltGr-x respectively. Most keyboards don't have an AltGr key. You must be an American. Most old the worlds keyboards have an AltGr key. You'll find that US keyboards have two alt keys to avoid confusing people (like one button mice ;)) but the right one is understood by the X bindings to be AltGr. Even though the US keyboard is apparently lacking functionality its purely a text label issue And those of us who are Mac OS X oriented have patched our console and X keycodes to match the mac way of generating symbols: Alt-\= « Alt-Shift-\ = » Alt-Shift-+ = ± If only someone could come up with a good character palette like exists on that OS, something that could generate a wide variety of keysyms, preferably all of UTF-8, and send them to the topmost window. Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Sat, 2005-08-13 at 21:19 -0400, Kyle Moffett wrote: And those of us who are Mac OS X oriented have patched our console and X keycodes to match the mac way of generating symbols: Alt-\= « Alt-Shift-\ = » Alt-Shift-+ = ± My point exactly, it's idiotic for Perl6 to use these as OPERATORS, the atoms of the language, when there's not even a platform independent way to type them in. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
Alan == Alan Cox [EMAIL PROTECTED] writes: Alan The command line console mappings may not include them by Alan default (you can obviously add them if your keyboard lacks Alan them). The X keyboard however does include compose functionality Alan for » and « and many other symbols that might be useful eg ± Not to mention that many editors, including emacs and vim, have their own support for entering such non-ascii characters no matter what the console or X11 keyboards look like. -JimC -- James H. Cloos, Jr. [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/